Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio.
@ 2019-06-25 22:07 Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

Changes in v6:
- Use _BITUL() instead of BIT() in UAPI for etf. (patch #1)
- Fix a bug reported by kbuild test bot in length_to_duration(). (patch #6)
- Remove an unused function (get_cycle_start()). (Patch #6)

Changes in v5:
- Commit message improved for the igb patch (patch #1).
- Fixed typo in commit message for etf patch (patch #2).

Changes in v4:
- Remove inline directive from functions in foo.c.
- Fix spacing in pkt_sched.h (for etf patch).

Changes in v3:
- Simplify implementation for taprio flags. 
- txtime_delay can only be set if txtime-assist mode is enabled.
- txtime_delay and flags will only be visible in tc output if set by user.
- Minor changes in error reporting.

Changes in v2:
- Txtime-offload has now been renamed to txtime-assist mode.
- Renamed the offload parameter to flags.
- Removed the code which introduced the hardware offloading functionality.

Original Cover letter (with above changes included)
--------------------------------------------------

Currently, we are seeing packets being transmitted outside their
timeslices. We can confirm that the packets are being dequeued at the right
time. So, the delay is induced after the packet is dequeued, because
taprio, without any offloading, has no control of when a packet is actually
transmitted.

In order to solve this, we are making use of the txtime feature provided by
ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in
order to take advantage of this feature. The taprio qdisc will assign
txtime (in skb->tstamp) for all the packets which do not have the txtime
allocated via the SO_TXTIME socket option. For the packets which already
have SO_TXTIME set, taprio will validate whether the packet will be
transmitted in the correct interval.

In order to support this, the following parameters have been added:
- flags (taprio): This is added in order to support different offloading
  modes which will be added in the future.
- txtime-delay (taprio): This indicates the minimum time it will take for
  the packet to hit the wire after it reaches taprio_enqueue(). This is
  useful in determining whether we can transmit the packet in the remaining
  time if the gate corresponding to the packet is currently open.
- skip_skb_check (ETF): ETF currently drops any packet which does not have
  the SO_TXTIME socket option set. This check can be skipped by specifying
  this option.

Following is an example configuration:

tc qdisc replace dev $IFACE parent root handle 100 taprio \\
    num_tc 3 \\
    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
    queues 1@0 1@0 1@0 \\
    base-time $BASE_TIME \\
    sched-entry S 01 300000 \\
    sched-entry S 02 300000 \\
    sched-entry S 04 400000 \\
    flags 0x1 \\
    txtime-delay 200000 \\
    clockid CLOCK_TAI

tc qdisc replace dev $IFACE parent 100:1 etf \\
    offload delta 200000 clockid CLOCK_TAI skip_skb_check

Here, the "flags" parameter is indicating that the txtime-assist mode is
enabled. Also, all the traffic classes have been assigned the same queue.
This is to prevent the traffic classes in the lower priority queues from
getting starved. Note that this configuration is specific to the i210
ethernet card. Other network cards where the hardware queues are given the
same priority, might be able to utilize more than one queue.

Following are some of the other highlights of the series:
- Fix a bug where hardware timestamping and SO_TXTIME options cannot be
  used together. (Patch 1)
- Introduces the skip_skb_check option.  (Patch 2)
- Make TxTime assist mode work with TCP packets (Patch 7).

The following changes are recommended to be done in order to get the best
performance from taprio in this mode:
# TSN in general does not allow Jumbo frames.
ip link set dev enp1s0 mtu 1514
# Disable segmentation offload. This is to prevent NIC from sending packets
# after the gate for a traffic class has closed.
ethtool -K eth0 gso off 
ethtool -K eth0 tso off
# Disable energy efficient ethernet to make sure there are no latency
# spikes when NIC is trying to wake up when the packet is supposed to be
# sent.
ethtool --set-eee eth0 eee off

Thanks,
Vedang Patel

Vedang Patel (8):
  igb: clear out skb->tstamp after reading the txtime
  etf: Don't use BIT() in UAPI headers.
  etf: Add skip_sock_check
  taprio: calculate cycle_time when schedule is installed
  taprio: Remove inline directive
  taprio: Add support for txtime-assist mode
  taprio: make clock reference conversions easier
  taprio: Adjust timestamps for TCP packets

 drivers/net/ethernet/intel/igb/igb_main.c |   1 +
 include/uapi/linux/pkt_sched.h            |   9 +-
 net/sched/sch_etf.c                       |  10 +
 net/sched/sch_taprio.c                    | 421 +++++++++++++++++++++++++++---
 4 files changed, 405 insertions(+), 36 deletions(-)

-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-26 19:44   ` Jeff Kirsher
  2019-06-28  3:01   ` Brown, Aaron F
  2019-06-25 22:07 ` [PATCH net-next v6 2/8] etf: Don't use BIT() in UAPI headers Vedang Patel
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

If a packet which is utilizing the launchtime feature (via SO_TXTIME socket
option) also requests the hardware transmit timestamp, the hardware
timestamp is not delivered to the userspace. This is because the value in
skb->tstamp is mistaken as the software timestamp.

Applications, like ptp4l, request a hardware timestamp by setting the
SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is
detected by the driver (this work is done in igb_ptp_tx_work() which calls
igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the
ERR_QUEUE for the userspace to read. When the userspace is ready, it will
issue a recvmsg() call to collect this timestamp.  The problem is in this
recvmsg() call. If the skb->tstamp is not cleared out, it will be
interpreted as a software timestamp and the hardware tx timestamp will not
be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the
callee function __sock_recv_timestamp() in net/socket.c for more details.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index fc925adbd9fa..f66dae72fe37 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5688,6 +5688,7 @@ static void igb_tx_ctxtdesc(struct igb_ring *tx_ring,
 	 */
 	if (tx_ring->launchtime_enable) {
 		ts = ns_to_timespec64(first->skb->tstamp);
+		first->skb->tstamp = 0;
 		context_desc->seqnum_seed = cpu_to_le32(ts.tv_nsec / 32);
 	} else {
 		context_desc->seqnum_seed = 0;
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 2/8] etf: Don't use BIT() in UAPI headers.
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 3/8] etf: Add skip_sock_check Vedang Patel
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

The BIT() macro isn't exported as part of the UAPI interface. So, the
compile-test to ensure they are self contained fails. So, use _BITUL()
instead.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 include/uapi/linux/pkt_sched.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8b2f993cbb77..f88c4e0bd9e5 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -988,8 +988,8 @@ struct tc_etf_qopt {
 	__s32 delta;
 	__s32 clockid;
 	__u32 flags;
-#define TC_ETF_DEADLINE_MODE_ON	BIT(0)
-#define TC_ETF_OFFLOAD_ON	BIT(1)
+#define TC_ETF_DEADLINE_MODE_ON	_BITUL(0)
+#define TC_ETF_OFFLOAD_ON	_BITUL(1)
 };
 
 enum {
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 3/8] etf: Add skip_sock_check
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 2/8] etf: Don't use BIT() in UAPI headers Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 4/8] taprio: calculate cycle_time when schedule is installed Vedang Patel
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

Currently, etf expects a socket with SO_TXTIME option set for each packet
it encounters. So, it will drop all other packets. But, in the future
commits we are planning to add functionality where tstamp value will be set
by another qdisc. Also, some packets which are generated from within the
kernel (e.g. ICMP packets) do not have any socket associated with them.

So, this commit adds support for skip_sock_check. When this option is set,
etf will skip checking for a socket and other associated options for all
skbs.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_etf.c            | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index f88c4e0bd9e5..127ac6d2888c 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -990,6 +990,7 @@ struct tc_etf_qopt {
 	__u32 flags;
 #define TC_ETF_DEADLINE_MODE_ON	_BITUL(0)
 #define TC_ETF_OFFLOAD_ON	_BITUL(1)
+#define TC_ETF_SKIP_SOCK_CHECK	_BITUL(2)
 };
 
 enum {
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index db0c2ba1d156..cebfb65d8556 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -22,10 +22,12 @@
 
 #define DEADLINE_MODE_IS_ON(x) ((x)->flags & TC_ETF_DEADLINE_MODE_ON)
 #define OFFLOAD_IS_ON(x) ((x)->flags & TC_ETF_OFFLOAD_ON)
+#define SKIP_SOCK_CHECK_IS_SET(x) ((x)->flags & TC_ETF_SKIP_SOCK_CHECK)
 
 struct etf_sched_data {
 	bool offload;
 	bool deadline_mode;
+	bool skip_sock_check;
 	int clockid;
 	int queue;
 	s32 delta; /* in ns */
@@ -77,6 +79,9 @@ static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
 	struct sock *sk = nskb->sk;
 	ktime_t now;
 
+	if (q->skip_sock_check)
+		goto skip;
+
 	if (!sk)
 		return false;
 
@@ -92,6 +97,7 @@ static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
 	if (sk->sk_txtime_deadline_mode != q->deadline_mode)
 		return false;
 
+skip:
 	now = q->get_time();
 	if (ktime_before(txtime, now) || ktime_before(txtime, q->last))
 		return false;
@@ -385,6 +391,7 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 	q->clockid = qopt->clockid;
 	q->offload = OFFLOAD_IS_ON(qopt);
 	q->deadline_mode = DEADLINE_MODE_IS_ON(qopt);
+	q->skip_sock_check = SKIP_SOCK_CHECK_IS_SET(qopt);
 
 	switch (q->clockid) {
 	case CLOCK_REALTIME:
@@ -473,6 +480,9 @@ static int etf_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (q->deadline_mode)
 		opt.flags |= TC_ETF_DEADLINE_MODE_ON;
 
+	if (q->skip_sock_check)
+		opt.flags |= TC_ETF_SKIP_SOCK_CHECK;
+
 	if (nla_put(skb, TCA_ETF_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
 
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 4/8] taprio: calculate cycle_time when schedule is installed
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (2 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 3/8] etf: Add skip_sock_check Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 5/8] taprio: Remove inline directive Vedang Patel
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

cycle time for a particular schedule is calculated only when it is first
installed. So, it makes sense to just calculate it once right after the
'cycle_time' parameter has been parsed and store it in cycle_time.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 net/sched/sch_taprio.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 9ecfb8f5902a..a41d7d4434ee 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -108,22 +108,6 @@ static void switch_schedules(struct taprio_sched *q,
 	*admin = NULL;
 }
 
-static ktime_t get_cycle_time(struct sched_gate_list *sched)
-{
-	struct sched_entry *entry;
-	ktime_t cycle = 0;
-
-	if (sched->cycle_time != 0)
-		return sched->cycle_time;
-
-	list_for_each_entry(entry, &sched->entries, list)
-		cycle = ktime_add_ns(cycle, entry->interval);
-
-	sched->cycle_time = cycle;
-
-	return cycle;
-}
-
 static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			  struct sk_buff **to_free)
 {
@@ -524,6 +508,15 @@ static int parse_taprio_schedule(struct nlattr **tb,
 	if (err < 0)
 		return err;
 
+	if (!new->cycle_time) {
+		struct sched_entry *entry;
+		ktime_t cycle = 0;
+
+		list_for_each_entry(entry, &new->entries, list)
+			cycle = ktime_add_ns(cycle, entry->interval);
+		new->cycle_time = cycle;
+	}
+
 	return 0;
 }
 
@@ -605,7 +598,7 @@ static int taprio_get_start_time(struct Qdisc *sch,
 		return 0;
 	}
 
-	cycle = get_cycle_time(sched);
+	cycle = sched->cycle_time;
 
 	/* The qdisc is expected to have at least one sched_entry.  Moreover,
 	 * any entry must have 'interval' > 0. Thus if the cycle time is zero,
@@ -632,7 +625,7 @@ static void setup_first_close_time(struct taprio_sched *q,
 	first = list_first_entry(&sched->entries,
 				 struct sched_entry, list);
 
-	cycle = get_cycle_time(sched);
+	cycle = sched->cycle_time;
 
 	/* FIXME: find a better place to do this */
 	sched->cycle_close_time = ktime_add_ns(base, cycle);
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 5/8] taprio: Remove inline directive
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (3 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 4/8] taprio: calculate cycle_time when schedule is installed Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 6/8] taprio: Add support for txtime-assist mode Vedang Patel
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

Remove inline directive from length_to_duration(). We will let the compiler
make the decisions.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 net/sched/sch_taprio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index a41d7d4434ee..6ef0cc03fdb9 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -168,7 +168,7 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
 	return NULL;
 }
 
-static inline int length_to_duration(struct taprio_sched *q, int len)
+static int length_to_duration(struct taprio_sched *q, int len)
 {
 	return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
 }
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 6/8] taprio: Add support for txtime-assist mode
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (4 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 5/8] taprio: Remove inline directive Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 7/8] taprio: make clock reference conversions easier Vedang Patel
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

Currently, we are seeing non-critical packets being transmitted outside of
their timeslice. We can confirm that the packets are being dequeued at the
right time. So, the delay is induced in the hardware side.  The most likely
reason is the hardware queues are starving the lower priority queues.

In order to improve the performance of taprio, we will be making use of the
txtime feature provided by the ETF qdisc. For all the packets which do not
have the SO_TXTIME option set, taprio will set the transmit timestamp (set
in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit
time for the packet is set to when the gate is open. If SO_TXTIME is set,
the TAPrio qdisc will validate whether the timestamp (in skb->tstamp)
occurs when the gate corresponding to skb's traffic class is open.

Following two parameters added to support this mode:
- flags: used to enable txtime-assist mode. Will also be used to enable
  other modes (like hardware offloading) later.
- txtime-delay: This indicates the minimum time it will take for the packet
  to hit the wire. This is useful in determining whether we can transmit
the packet in the remaining time if the gate corresponding to the packet is
currently open.

An example configuration for enabling txtime-assist:

tc qdisc replace dev eth0 parent root handle 100 taprio \\
      num_tc 3 \\
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
      queues 1@0 1@0 1@0 \\
      base-time 1558653424279842568 \\
      sched-entry S 01 300000 \\
      sched-entry S 02 300000 \\
      sched-entry S 04 400000 \\
      flags 0x1 \\
      txtime-delay 40000 \\
      clockid CLOCK_TAI

tc qdisc replace dev $IFACE parent 100:1 etf skip_sock_check \\
      offload delta 200000 clockid CLOCK_TAI

Note that all the traffic classes are mapped to the same queue.  This is
only possible in taprio when txtime-assist is enabled. Also, note that the
ETF Qdisc is enabled with offload mode set.

In this mode, if the packet's traffic class is open and the complete packet
can be transmitted, taprio will try to transmit the packet immediately.
This will be done by setting skb->tstamp to current_time + the time delta
indicated in the txtime-delay parameter. This parameter indicates the time
taken (in software) for packet to reach the network adapter.

If the packet cannot be transmitted in the current interval or if the
packet's traffic is not currently transmitting, the skb->tstamp is set to
the next available timestamp value. This is tracked in the next_launchtime
parameter in the struct sched_entry.

The behaviour w.r.t admin and oper schedules is not changed from what is
present in software mode.

The transmit time is already known in advance. So, we do not need the HR
timers to advance the schedule and wakeup the dequeue side of taprio.  So,
HR timer won't be run when this mode is enabled.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 include/uapi/linux/pkt_sched.h |   4 +
 net/sched/sch_taprio.c         | 341 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 328 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 127ac6d2888c..390efb54b2e0 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1159,6 +1159,8 @@ enum {
  *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
  */
 
+#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST 0x1
+
 enum {
 	TCA_TAPRIO_ATTR_UNSPEC,
 	TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
@@ -1170,6 +1172,8 @@ enum {
 	TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
+	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
+	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* s32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 6ef0cc03fdb9..078230e44471 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -21,12 +21,16 @@
 #include <net/pkt_sched.h>
 #include <net/pkt_cls.h>
 #include <net/sch_generic.h>
+#include <net/sock.h>
 
 static LIST_HEAD(taprio_list);
 static DEFINE_SPINLOCK(taprio_list_lock);
 
 #define TAPRIO_ALL_GATES_OPEN -1
 
+#define FLAGS_VALID(flags) (!((flags) & ~TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST))
+#define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
+
 struct sched_entry {
 	struct list_head list;
 
@@ -35,6 +39,7 @@ struct sched_entry {
 	 * packet leaves after this time.
 	 */
 	ktime_t close_time;
+	ktime_t next_txtime;
 	atomic_t budget;
 	int index;
 	u32 gate_mask;
@@ -55,6 +60,7 @@ struct sched_gate_list {
 struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
+	u32 flags;
 	int clockid;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
 				    * speeds it's sub-nanoseconds per byte
@@ -68,6 +74,7 @@ struct taprio_sched {
 	ktime_t (*get_time)(void);
 	struct hrtimer advance_timer;
 	struct list_head taprio_list;
+	int txtime_delay;
 };
 
 static ktime_t sched_base_time(const struct sched_gate_list *sched)
@@ -108,6 +115,227 @@ static void switch_schedules(struct taprio_sched *q,
 	*admin = NULL;
 }
 
+/* Get how much time has been already elapsed in the current cycle. */
+static s32 get_cycle_time_elapsed(struct sched_gate_list *sched, ktime_t time)
+{
+	ktime_t time_since_sched_start;
+	s32 time_elapsed;
+
+	time_since_sched_start = ktime_sub(time, sched->base_time);
+	div_s64_rem(time_since_sched_start, sched->cycle_time, &time_elapsed);
+
+	return time_elapsed;
+}
+
+static ktime_t get_interval_end_time(struct sched_gate_list *sched,
+				     struct sched_gate_list *admin,
+				     struct sched_entry *entry,
+				     ktime_t intv_start)
+{
+	s32 cycle_elapsed = get_cycle_time_elapsed(sched, intv_start);
+	ktime_t intv_end, cycle_ext_end, cycle_end;
+
+	cycle_end = ktime_add_ns(intv_start, sched->cycle_time - cycle_elapsed);
+	intv_end = ktime_add_ns(intv_start, entry->interval);
+	cycle_ext_end = ktime_add(cycle_end, sched->cycle_time_extension);
+
+	if (ktime_before(intv_end, cycle_end))
+		return intv_end;
+	else if (admin && admin != sched &&
+		 ktime_after(admin->base_time, cycle_end) &&
+		 ktime_before(admin->base_time, cycle_ext_end))
+		return admin->base_time;
+	else
+		return cycle_end;
+}
+
+static int length_to_duration(struct taprio_sched *q, int len)
+{
+	return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
+}
+
+/* Returns the entry corresponding to next available interval. If
+ * validate_interval is set, it only validates whether the timestamp occurs
+ * when the gate corresponding to the skb's traffic class is open.
+ */
+static struct sched_entry *find_entry_to_transmit(struct sk_buff *skb,
+						  struct Qdisc *sch,
+						  struct sched_gate_list *sched,
+						  struct sched_gate_list *admin,
+						  ktime_t time,
+						  ktime_t *interval_start,
+						  ktime_t *interval_end,
+						  bool validate_interval)
+{
+	ktime_t curr_intv_start, curr_intv_end, cycle_end, packet_transmit_time;
+	ktime_t earliest_txtime = KTIME_MAX, txtime, cycle, transmit_end_time;
+	struct sched_entry *entry = NULL, *entry_found = NULL;
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	bool entry_available = false;
+	s32 cycle_elapsed;
+	int tc, n;
+
+	tc = netdev_get_prio_tc_map(dev, skb->priority);
+	packet_transmit_time = length_to_duration(q, qdisc_pkt_len(skb));
+
+	*interval_start = 0;
+	*interval_end = 0;
+
+	if (!sched)
+		return NULL;
+
+	cycle = sched->cycle_time;
+	cycle_elapsed = get_cycle_time_elapsed(sched, time);
+	curr_intv_end = ktime_sub_ns(time, cycle_elapsed);
+	cycle_end = ktime_add_ns(curr_intv_end, cycle);
+
+	list_for_each_entry(entry, &sched->entries, list) {
+		curr_intv_start = curr_intv_end;
+		curr_intv_end = get_interval_end_time(sched, admin, entry,
+						      curr_intv_start);
+
+		if (ktime_after(curr_intv_start, cycle_end))
+			break;
+
+		if (!(entry->gate_mask & BIT(tc)) ||
+		    packet_transmit_time > entry->interval)
+			continue;
+
+		txtime = entry->next_txtime;
+
+		if (ktime_before(txtime, time) || validate_interval) {
+			transmit_end_time = ktime_add_ns(time, packet_transmit_time);
+			if ((ktime_before(curr_intv_start, time) &&
+			     ktime_before(transmit_end_time, curr_intv_end)) ||
+			    (ktime_after(curr_intv_start, time) && !validate_interval)) {
+				entry_found = entry;
+				*interval_start = curr_intv_start;
+				*interval_end = curr_intv_end;
+				break;
+			} else if (!entry_available && !validate_interval) {
+				/* Here, we are just trying to find out the
+				 * first available interval in the next cycle.
+				 */
+				entry_available = 1;
+				entry_found = entry;
+				*interval_start = ktime_add_ns(curr_intv_start, cycle);
+				*interval_end = ktime_add_ns(curr_intv_end, cycle);
+			}
+		} else if (ktime_before(txtime, earliest_txtime) &&
+			   !entry_available) {
+			earliest_txtime = txtime;
+			entry_found = entry;
+			n = div_s64(ktime_sub(txtime, curr_intv_start), cycle);
+			*interval_start = ktime_add(curr_intv_start, n * cycle);
+			*interval_end = ktime_add(curr_intv_end, n * cycle);
+		}
+	}
+
+	return entry_found;
+}
+
+static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct sched_gate_list *sched, *admin;
+	ktime_t interval_start, interval_end;
+	struct sched_entry *entry;
+
+	rcu_read_lock();
+	sched = rcu_dereference(q->oper_sched);
+	admin = rcu_dereference(q->admin_sched);
+
+	entry = find_entry_to_transmit(skb, sch, sched, admin, skb->tstamp,
+				       &interval_start, &interval_end, true);
+	rcu_read_unlock();
+
+	return entry;
+}
+
+/* There are a few scenarios where we will have to modify the txtime from
+ * what is read from next_txtime in sched_entry. They are:
+ * 1. If txtime is in the past,
+ *    a. The gate for the traffic class is currently open and packet can be
+ *       transmitted before it closes, schedule the packet right away.
+ *    b. If the gate corresponding to the traffic class is going to open later
+ *       in the cycle, set the txtime of packet to the interval start.
+ * 2. If txtime is in the future, there are packets corresponding to the
+ *    current traffic class waiting to be transmitted. So, the following
+ *    possibilities exist:
+ *    a. We can transmit the packet before the window containing the txtime
+ *       closes.
+ *    b. The window might close before the transmission can be completed
+ *       successfully. So, schedule the packet in the next open window.
+ */
+static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
+{
+	ktime_t transmit_end_time, interval_end, interval_start;
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct sched_gate_list *sched, *admin;
+	ktime_t minimum_time, now, txtime;
+	int len, packet_transmit_time;
+	struct sched_entry *entry;
+	bool sched_changed;
+
+	now = q->get_time();
+	minimum_time = ktime_add_ns(now, q->txtime_delay);
+
+	rcu_read_lock();
+	admin = rcu_dereference(q->admin_sched);
+	sched = rcu_dereference(q->oper_sched);
+	if (admin && ktime_after(minimum_time, admin->base_time))
+		switch_schedules(q, &admin, &sched);
+
+	/* Until the schedule starts, all the queues are open */
+	if (!sched || ktime_before(minimum_time, sched->base_time)) {
+		txtime = minimum_time;
+		goto done;
+	}
+
+	len = qdisc_pkt_len(skb);
+	packet_transmit_time = length_to_duration(q, len);
+
+	do {
+		sched_changed = 0;
+
+		entry = find_entry_to_transmit(skb, sch, sched, admin,
+					       minimum_time,
+					       &interval_start, &interval_end,
+					       false);
+		if (!entry) {
+			txtime = 0;
+			goto done;
+		}
+
+		txtime = entry->next_txtime;
+		txtime = max_t(ktime_t, txtime, minimum_time);
+		txtime = max_t(ktime_t, txtime, interval_start);
+
+		if (admin && admin != sched &&
+		    ktime_after(txtime, admin->base_time)) {
+			sched = admin;
+			sched_changed = 1;
+			continue;
+		}
+
+		transmit_end_time = ktime_add(txtime, packet_transmit_time);
+		minimum_time = transmit_end_time;
+
+		/* Update the txtime of current entry to the next time it's
+		 * interval starts.
+		 */
+		if (ktime_after(transmit_end_time, interval_end))
+			entry->next_txtime = ktime_add(interval_start, sched->cycle_time);
+	} while (sched_changed || ktime_after(transmit_end_time, interval_end));
+
+	entry->next_txtime = transmit_end_time;
+
+done:
+	rcu_read_unlock();
+	return txtime;
+}
+
 static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			  struct sk_buff **to_free)
 {
@@ -121,6 +349,15 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (unlikely(!child))
 		return qdisc_drop(skb, sch, to_free);
 
+	if (skb->sk && sock_flag(skb->sk, SOCK_TXTIME)) {
+		if (!is_valid_interval(skb, sch))
+			return qdisc_drop(skb, sch, to_free);
+	} else if (TXTIME_ASSIST_IS_ENABLED(q->flags)) {
+		skb->tstamp = get_packet_txtime(skb, sch);
+		if (!skb->tstamp)
+			return qdisc_drop(skb, sch, to_free);
+	}
+
 	qdisc_qstats_backlog_inc(sch, skb);
 	sch->q.qlen++;
 
@@ -156,6 +393,9 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
 		if (!skb)
 			continue;
 
+		if (TXTIME_ASSIST_IS_ENABLED(q->flags))
+			return skb;
+
 		prio = skb->priority;
 		tc = netdev_get_prio_tc_map(dev, prio);
 
@@ -168,11 +408,6 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
 	return NULL;
 }
 
-static int length_to_duration(struct taprio_sched *q, int len)
-{
-	return div_u64(len * atomic64_read(&q->picos_per_byte), 1000);
-}
-
 static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
 {
 	atomic_set(&entry->budget,
@@ -216,6 +451,13 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 		if (unlikely(!child))
 			continue;
 
+		if (TXTIME_ASSIST_IS_ENABLED(q->flags)) {
+			skb = child->ops->dequeue(child);
+			if (!skb)
+				continue;
+			goto skb_found;
+		}
+
 		skb = child->ops->peek(child);
 		if (!skb)
 			continue;
@@ -246,6 +488,7 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 		if (unlikely(!skb))
 			goto done;
 
+skb_found:
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
 		sch->q.qlen--;
@@ -522,7 +765,8 @@ static int parse_taprio_schedule(struct nlattr **tb,
 
 static int taprio_parse_mqprio_opt(struct net_device *dev,
 				   struct tc_mqprio_qopt *qopt,
-				   struct netlink_ext_ack *extack)
+				   struct netlink_ext_ack *extack,
+				   u32 taprio_flags)
 {
 	int i, j;
 
@@ -570,6 +814,9 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
 			return -EINVAL;
 		}
 
+		if (TXTIME_ASSIST_IS_ENABLED(taprio_flags))
+			continue;
+
 		/* Verify that the offset and counts do not overlap */
 		for (j = i + 1; j < qopt->num_tc; j++) {
 			if (last > qopt->offset[j]) {
@@ -700,6 +947,18 @@ static int taprio_dev_notifier(struct notifier_block *nb, unsigned long event,
 	return NOTIFY_DONE;
 }
 
+static void setup_txtime(struct taprio_sched *q,
+			 struct sched_gate_list *sched, ktime_t base)
+{
+	struct sched_entry *entry;
+	u32 interval = 0;
+
+	list_for_each_entry(entry, &sched->entries, list) {
+		entry->next_txtime = ktime_add_ns(base, interval);
+		interval += entry->interval;
+	}
+}
+
 static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			 struct netlink_ext_ack *extack)
 {
@@ -708,6 +967,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
 	struct tc_mqprio_qopt *mqprio = NULL;
+	u32 taprio_flags = 0;
 	int i, err, clockid;
 	unsigned long flags;
 	ktime_t start;
@@ -720,7 +980,21 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (tb[TCA_TAPRIO_ATTR_PRIOMAP])
 		mqprio = nla_data(tb[TCA_TAPRIO_ATTR_PRIOMAP]);
 
-	err = taprio_parse_mqprio_opt(dev, mqprio, extack);
+	if (tb[TCA_TAPRIO_ATTR_FLAGS]) {
+		taprio_flags = nla_get_u32(tb[TCA_TAPRIO_ATTR_FLAGS]);
+
+		if (q->flags != 0 && q->flags != taprio_flags) {
+			NL_SET_ERR_MSG_MOD(extack, "Changing 'flags' of a running schedule is not supported");
+			return -EOPNOTSUPP;
+		} else if (!FLAGS_VALID(taprio_flags)) {
+			NL_SET_ERR_MSG_MOD(extack, "Specified 'flags' are not valid");
+			return -EINVAL;
+		}
+
+		q->flags = taprio_flags;
+	}
+
+	err = taprio_parse_mqprio_opt(dev, mqprio, extack, taprio_flags);
 	if (err < 0)
 		return err;
 
@@ -779,7 +1053,18 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	/* Protects against enqueue()/dequeue() */
 	spin_lock_bh(qdisc_lock(sch));
 
-	if (!hrtimer_active(&q->advance_timer)) {
+	if (tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]) {
+		if (!TXTIME_ASSIST_IS_ENABLED(q->flags)) {
+			NL_SET_ERR_MSG_MOD(extack, "txtime-delay can only be set when txtime-assist mode is enabled");
+			err = -EINVAL;
+			goto unlock;
+		}
+
+		q->txtime_delay = nla_get_s32(tb[TCA_TAPRIO_ATTR_TXTIME_DELAY]);
+	}
+
+	if (!TXTIME_ASSIST_IS_ENABLED(taprio_flags) &&
+	    !hrtimer_active(&q->advance_timer)) {
 		hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
 		q->advance_timer.function = advance_sched;
 	}
@@ -822,20 +1107,35 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		goto unlock;
 	}
 
-	setup_first_close_time(q, new_admin, start);
+	if (TXTIME_ASSIST_IS_ENABLED(taprio_flags)) {
+		setup_txtime(q, new_admin, start);
 
-	/* Protects against advance_sched() */
-	spin_lock_irqsave(&q->current_entry_lock, flags);
+		if (!oper) {
+			rcu_assign_pointer(q->oper_sched, new_admin);
+			err = 0;
+			new_admin = NULL;
+			goto unlock;
+		}
 
-	taprio_start_sched(sch, start, new_admin);
+		rcu_assign_pointer(q->admin_sched, new_admin);
+		if (admin)
+			call_rcu(&admin->rcu, taprio_free_sched_cb);
+	} else {
+		setup_first_close_time(q, new_admin, start);
 
-	rcu_assign_pointer(q->admin_sched, new_admin);
-	if (admin)
-		call_rcu(&admin->rcu, taprio_free_sched_cb);
-	new_admin = NULL;
+		/* Protects against advance_sched() */
+		spin_lock_irqsave(&q->current_entry_lock, flags);
+
+		taprio_start_sched(sch, start, new_admin);
 
-	spin_unlock_irqrestore(&q->current_entry_lock, flags);
+		rcu_assign_pointer(q->admin_sched, new_admin);
+		if (admin)
+			call_rcu(&admin->rcu, taprio_free_sched_cb);
 
+		spin_unlock_irqrestore(&q->current_entry_lock, flags);
+	}
+
+	new_admin = NULL;
 	err = 0;
 
 unlock:
@@ -1073,6 +1373,13 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
 		goto options_error;
 
+	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
+		goto options_error;
+
+	if (q->txtime_delay &&
+	    nla_put_s32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
+		goto options_error;
+
 	if (oper && dump_schedule(skb, oper))
 		goto options_error;
 
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 7/8] taprio: make clock reference conversions easier
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (5 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 6/8] taprio: Add support for txtime-assist mode Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-25 22:07 ` [PATCH net-next v6 8/8] taprio: Adjust timestamps for TCP packets Vedang Patel
  2019-06-28 21:46 ` [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio David Miller
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

Later in this series we will need to transform from
CLOCK_MONOTONIC (used in TCP) to the clock reference used in TAPRIO.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 net/sched/sch_taprio.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 078230e44471..5c5e7db520dc 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -61,6 +61,7 @@ struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
 	u32 flags;
+	enum tk_offsets tk_offset;
 	int clockid;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
 				    * speeds it's sub-nanoseconds per byte
@@ -71,7 +72,6 @@ struct taprio_sched {
 	struct sched_entry __rcu *current_entry;
 	struct sched_gate_list __rcu *oper_sched;
 	struct sched_gate_list __rcu *admin_sched;
-	ktime_t (*get_time)(void);
 	struct hrtimer advance_timer;
 	struct list_head taprio_list;
 	int txtime_delay;
@@ -85,6 +85,20 @@ static ktime_t sched_base_time(const struct sched_gate_list *sched)
 	return ns_to_ktime(sched->base_time);
 }
 
+static ktime_t taprio_get_time(struct taprio_sched *q)
+{
+	ktime_t mono = ktime_get();
+
+	switch (q->tk_offset) {
+	case TK_OFFS_MAX:
+		return mono;
+	default:
+		return ktime_mono_to_any(mono, q->tk_offset);
+	}
+
+	return KTIME_MAX;
+}
+
 static void taprio_free_sched_cb(struct rcu_head *head)
 {
 	struct sched_gate_list *sched = container_of(head, struct sched_gate_list, rcu);
@@ -278,7 +292,7 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
 	struct sched_entry *entry;
 	bool sched_changed;
 
-	now = q->get_time();
+	now = taprio_get_time(q);
 	minimum_time = ktime_add_ns(now, q->txtime_delay);
 
 	rcu_read_lock();
@@ -469,7 +483,7 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 			continue;
 
 		len = qdisc_pkt_len(skb);
-		guard = ktime_add_ns(q->get_time(),
+		guard = ktime_add_ns(taprio_get_time(q),
 				     length_to_duration(q, len));
 
 		/* In the case that there's no gate entry, there's no
@@ -838,7 +852,7 @@ static int taprio_get_start_time(struct Qdisc *sch,
 	s64 n;
 
 	base = sched_base_time(sched);
-	now = q->get_time();
+	now = taprio_get_time(q);
 
 	if (ktime_after(base, now)) {
 		*start = base;
@@ -1084,16 +1098,16 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 
 	switch (q->clockid) {
 	case CLOCK_REALTIME:
-		q->get_time = ktime_get_real;
+		q->tk_offset = TK_OFFS_REAL;
 		break;
 	case CLOCK_MONOTONIC:
-		q->get_time = ktime_get;
+		q->tk_offset = TK_OFFS_MAX;
 		break;
 	case CLOCK_BOOTTIME:
-		q->get_time = ktime_get_boottime;
+		q->tk_offset = TK_OFFS_BOOT;
 		break;
 	case CLOCK_TAI:
-		q->get_time = ktime_get_clocktai;
+		q->tk_offset = TK_OFFS_TAI;
 		break;
 	default:
 		NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v6 8/8] taprio: Adjust timestamps for TCP packets
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (6 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 7/8] taprio: make clock reference conversions easier Vedang Patel
@ 2019-06-25 22:07 ` Vedang Patel
  2019-06-28 21:46 ` [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio David Miller
  8 siblings, 0 replies; 13+ messages in thread
From: Vedang Patel @ 2019-06-25 22:07 UTC (permalink / raw)
  To: netdev
  Cc: jeffrey.t.kirsher, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown, Vedang Patel

When the taprio qdisc is running in "txtime offload" mode, it will
set the launchtime value (in skb->tstamp) for all the packets which do
not have the SO_TXTIME socket option. But, the TCP packets already have
this value set and it indicates the earliest departure time represented
in CLOCK_MONOTONIC clock.

We need to respect the timestamp set by the TCP subsystem. So, convert
this time to the clock which taprio is using and ensure that the packet
is not transmitted before the deadline set by TCP.

Signed-off-by: Vedang Patel <vedang.patel@intel.com>
---
 net/sched/sch_taprio.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 5c5e7db520dc..388750ddc57a 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -22,6 +22,7 @@
 #include <net/pkt_cls.h>
 #include <net/sch_generic.h>
 #include <net/sock.h>
+#include <net/tcp.h>
 
 static LIST_HEAD(taprio_list);
 static DEFINE_SPINLOCK(taprio_list_lock);
@@ -267,6 +268,41 @@ static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
 	return entry;
 }
 
+/* This returns the tstamp value set by TCP in terms of the set clock. */
+static ktime_t get_tcp_tstamp(struct taprio_sched *q, struct sk_buff *skb)
+{
+	unsigned int offset = skb_network_offset(skb);
+	const struct ipv6hdr *ipv6h;
+	const struct iphdr *iph;
+	struct ipv6hdr _ipv6h;
+
+	ipv6h = skb_header_pointer(skb, offset, sizeof(_ipv6h), &_ipv6h);
+	if (!ipv6h)
+		return 0;
+
+	if (ipv6h->version == 4) {
+		iph = (struct iphdr *)ipv6h;
+		offset += iph->ihl * 4;
+
+		/* special-case 6in4 tunnelling, as that is a common way to get
+		 * v6 connectivity in the home
+		 */
+		if (iph->protocol == IPPROTO_IPV6) {
+			ipv6h = skb_header_pointer(skb, offset,
+						   sizeof(_ipv6h), &_ipv6h);
+
+			if (!ipv6h || ipv6h->nexthdr != IPPROTO_TCP)
+				return 0;
+		} else if (iph->protocol != IPPROTO_TCP) {
+			return 0;
+		}
+	} else if (ipv6h->version == 6 && ipv6h->nexthdr != IPPROTO_TCP) {
+		return 0;
+	}
+
+	return ktime_mono_to_any(skb->skb_mstamp_ns, q->tk_offset);
+}
+
 /* There are a few scenarios where we will have to modify the txtime from
  * what is read from next_txtime in sched_entry. They are:
  * 1. If txtime is in the past,
@@ -284,7 +320,7 @@ static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
  */
 static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
 {
-	ktime_t transmit_end_time, interval_end, interval_start;
+	ktime_t transmit_end_time, interval_end, interval_start, tcp_tstamp;
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct sched_gate_list *sched, *admin;
 	ktime_t minimum_time, now, txtime;
@@ -295,6 +331,9 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
 	now = taprio_get_time(q);
 	minimum_time = ktime_add_ns(now, q->txtime_delay);
 
+	tcp_tstamp = get_tcp_tstamp(q, skb);
+	minimum_time = max_t(ktime_t, minimum_time, tcp_tstamp);
+
 	rcu_read_lock();
 	admin = rcu_dereference(q->admin_sched);
 	sched = rcu_dereference(q->oper_sched);
-- 
2.7.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime
  2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
@ 2019-06-26 19:44   ` Jeff Kirsher
  2019-06-26 20:06     ` Patel, Vedang
  2019-06-28  3:01   ` Brown, Aaron F
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Kirsher @ 2019-06-26 19:44 UTC (permalink / raw)
  To: Vedang Patel, netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	vinicius.gomes, l, jakub.kicinski, m-karicheri2, sergei.shtylyov,
	eric.dumazet, aaron.f.brown

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On Tue, 2019-06-25 at 15:07 -0700, Vedang Patel wrote:
> If a packet which is utilizing the launchtime feature (via SO_TXTIME
> socket
> option) also requests the hardware transmit timestamp, the hardware
> timestamp is not delivered to the userspace. This is because the
> value in
> skb->tstamp is mistaken as the software timestamp.
> 
> Applications, like ptp4l, request a hardware timestamp by setting the
> SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp
> is
> detected by the driver (this work is done in igb_ptp_tx_work() which
> calls
> igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp
> in the
> ERR_QUEUE for the userspace to read. When the userspace is ready, it
> will
> issue a recvmsg() call to collect this timestamp.  The problem is in
> this
> recvmsg() call. If the skb->tstamp is not cleared out, it will be
> interpreted as a software timestamp and the hardware tx timestamp
> will not
> be successfully sent to the userspace. Look at skb_is_swtx_tstamp()
> and the
> callee function __sock_recv_timestamp() in net/socket.c for more
> details.
> 
> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 1 +
>  1 file changed, 1 insertion(+)

Since this fix is really not needed for the rest of the patch series,
if you have to do another version of the series, can you drop this
patch from any future version?  I don't want to keep spinning my
validation team on a updated version of this patch, that is not really
being updated.

I plan to take this version of the patch and will hold onto it for my
next 1GbE updates to Dave.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime
  2019-06-26 19:44   ` Jeff Kirsher
@ 2019-06-26 20:06     ` Patel, Vedang
  0 siblings, 0 replies; 13+ messages in thread
From: Patel, Vedang @ 2019-06-26 20:06 UTC (permalink / raw)
  To: Kirsher, Jeffrey T
  Cc: netdev, David Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	intel-wired-lan, Gomes, Vinicius, l, Jakub Kicinski,
	Murali Karicheri, Sergei Shtylyov, Eric Dumazet, Brown, Aaron F

Hi Jeff,

> On Jun 26, 2019, at 12:44 PM, Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com> wrote:
> 
> On Tue, 2019-06-25 at 15:07 -0700, Vedang Patel wrote:
>> If a packet which is utilizing the launchtime feature (via SO_TXTIME
>> socket
>> option) also requests the hardware transmit timestamp, the hardware
>> timestamp is not delivered to the userspace. This is because the
>> value in
>> skb->tstamp is mistaken as the software timestamp.
>> 
>> Applications, like ptp4l, request a hardware timestamp by setting the
>> SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp
>> is
>> detected by the driver (this work is done in igb_ptp_tx_work() which
>> calls
>> igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp
>> in the
>> ERR_QUEUE for the userspace to read. When the userspace is ready, it
>> will
>> issue a recvmsg() call to collect this timestamp.  The problem is in
>> this
>> recvmsg() call. If the skb->tstamp is not cleared out, it will be
>> interpreted as a software timestamp and the hardware tx timestamp
>> will not
>> be successfully sent to the userspace. Look at skb_is_swtx_tstamp()
>> and the
>> callee function __sock_recv_timestamp() in net/socket.c for more
>> details.
>> 
>> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
>> ---
>> drivers/net/ethernet/intel/igb/igb_main.c | 1 +
>> 1 file changed, 1 insertion(+)
> 
> Since this fix is really not needed for the rest of the patch series,
> if you have to do another version of the series, can you drop this
> patch from any future version?  I don't want to keep spinning my
> validation team on a updated version of this patch, that is not really
> being updated.
> 
> I plan to take this version of the patch and will hold onto it for my
> next 1GbE updates to Dave.

This patch is needed for ptp4l to function properly when txtime-assist mode is enabled. So, dropping this patch will break the series. When are you planning to submit the next set of updates to Dave? If a new version is needed, I can plan to send it out after you send your updates.

Thanks,
Vedang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime
  2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
  2019-06-26 19:44   ` Jeff Kirsher
@ 2019-06-28  3:01   ` Brown, Aaron F
  1 sibling, 0 replies; 13+ messages in thread
From: Brown, Aaron F @ 2019-06-28  3:01 UTC (permalink / raw)
  To: Patel, Vedang, netdev
  Cc: Kirsher, Jeffrey T, davem, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, Gomes, Vinicius, l, jakub.kicinski,
	m-karicheri2, sergei.shtylyov, eric.dumazet

> From: Patel, Vedang
> Sent: Tuesday, June 25, 2019 3:07 PM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; davem@davemloft.net;
> jhs@mojatatu.com; xiyou.wangcong@gmail.com; jiri@resnulli.us; intel-wired-
> lan@lists.osuosl.org; Gomes, Vinicius <vinicius.gomes@intel.com>;
> l@dorileo.org; jakub.kicinski@netronome.com; m-karicheri2@ti.com;
> sergei.shtylyov@cogentembedded.com; eric.dumazet@gmail.com; Brown,
> Aaron F <aaron.f.brown@intel.com>; Patel, Vedang <vedang.patel@intel.com>
> Subject: [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the
> txtime
> 
> If a packet which is utilizing the launchtime feature (via SO_TXTIME socket
> option) also requests the hardware transmit timestamp, the hardware
> timestamp is not delivered to the userspace. This is because the value in
> skb->tstamp is mistaken as the software timestamp.
> 
> Applications, like ptp4l, request a hardware timestamp by setting the
> SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new
> timestamp is
> detected by the driver (this work is done in igb_ptp_tx_work() which calls
> igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the
> ERR_QUEUE for the userspace to read. When the userspace is ready, it will
> issue a recvmsg() call to collect this timestamp.  The problem is in this
> recvmsg() call. If the skb->tstamp is not cleared out, it will be
> interpreted as a software timestamp and the hardware tx timestamp will not
> be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the
> callee function __sock_recv_timestamp() in net/socket.c for more details.
> 
> Signed-off-by: Vedang Patel <vedang.patel@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
Tested-by: Aaron Brown <aaron.f.brown@intel.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio.
  2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
                   ` (7 preceding siblings ...)
  2019-06-25 22:07 ` [PATCH net-next v6 8/8] taprio: Adjust timestamps for TCP packets Vedang Patel
@ 2019-06-28 21:46 ` David Miller
  8 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2019-06-28 21:46 UTC (permalink / raw)
  To: vedang.patel
  Cc: netdev, jeffrey.t.kirsher, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, vinicius.gomes, l, jakub.kicinski, m-karicheri2,
	sergei.shtylyov, eric.dumazet, aaron.f.brown

From: Vedang Patel <vedang.patel@intel.com>
Date: Tue, 25 Jun 2019 15:07:11 -0700

> Currently, we are seeing packets being transmitted outside their
> timeslices. We can confirm that the packets are being dequeued at the right
> time. So, the delay is induced after the packet is dequeued, because
> taprio, without any offloading, has no control of when a packet is actually
> transmitted.
> 
> In order to solve this, we are making use of the txtime feature provided by
> ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in
> order to take advantage of this feature. The taprio qdisc will assign
> txtime (in skb->tstamp) for all the packets which do not have the txtime
> allocated via the SO_TXTIME socket option. For the packets which already
> have SO_TXTIME set, taprio will validate whether the packet will be
> transmitted in the correct interval.
> 
> In order to support this, the following parameters have been added:
> - flags (taprio): This is added in order to support different offloading
>   modes which will be added in the future.
> - txtime-delay (taprio): This indicates the minimum time it will take for
>   the packet to hit the wire after it reaches taprio_enqueue(). This is
>   useful in determining whether we can transmit the packet in the remaining
>   time if the gate corresponding to the packet is currently open.
> - skip_skb_check (ETF): ETF currently drops any packet which does not have
>   the SO_TXTIME socket option set. This check can be skipped by specifying
>   this option.
 ...

Series applied, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 22:07 [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 1/8] igb: clear out skb->tstamp after reading the txtime Vedang Patel
2019-06-26 19:44   ` Jeff Kirsher
2019-06-26 20:06     ` Patel, Vedang
2019-06-28  3:01   ` Brown, Aaron F
2019-06-25 22:07 ` [PATCH net-next v6 2/8] etf: Don't use BIT() in UAPI headers Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 3/8] etf: Add skip_sock_check Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 4/8] taprio: calculate cycle_time when schedule is installed Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 5/8] taprio: Remove inline directive Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 6/8] taprio: Add support for txtime-assist mode Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 7/8] taprio: make clock reference conversions easier Vedang Patel
2019-06-25 22:07 ` [PATCH net-next v6 8/8] taprio: Adjust timestamps for TCP packets Vedang Patel
2019-06-28 21:46 ` [PATCH net-next v6 0/8] net/sched: Add txtime-assist support for taprio David Miller

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox