All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF
@ 2018-06-27 21:59 Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 01/14] net: Clear skb->tstamp only on the forwarding path Jesus Sanchez-Palencia
                   ` (13 more replies)
  0 siblings, 14 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri

Overview
========

This work consists of a set of kernel interfaces that can be used by
applications that require (time-based) Scheduled Tx of packets.
It is comprised by 3 new components to the kernel:

  - SO_TXTIME: socket option + cmsg programming interfaces.

  - etf: the "earliest txtime first" qdisc, that provides per-queue
	 TxTime-based scheduling. This has been renamed from 'tbs' to
	 'etf' to better describe its functionality.

  - taprio: the "time-aware priority scheduler" qdisc, that provides
	    per-port Time-Aware scheduling;

This patchset is providing the first 2 components, which have been
developed for longer. The taprio qdisc will be shared as an RFC separately
(shortly).

Note that this series is a follow up of the "Time based packet
transmission" RFCv3 [1].



etf (formerly known as 'tbs')
=============================

Changes since the RFC v3:
  - removed patch adding CLOCKID_INVALID;
  - now we report packet drops through the socket's error queue;
  - the usage of CLOCK_TAI is enforced by the qdisc;
  - fixed bug on igb driver to avoid timestamps from being overwritten;
  - simplified queueing modes by making 'sorting' mandatory;
  - renamed qdisc from 'tbs' to 'etf'.

For applications/systems that the concept of time slices isn't precise
enough, the etf qdisc allows applications to control the instant when
a packet should leave the network controller. When used in conjunction
with taprio, it can also be used in case the application needs to
control with greater guarantee the offset into each time slice a packet
will be sent. Another use case of etf, is when only a small number of
applications on a system are time sensitive, so it can then be used
with a more traditional root qdisc (like mqprio).

The etf qdisc is designed so it buffers packets until a configurable
time before their deadline (Tx time). The qdisc uses a rbtree internally
so the buffered packets are always 'ordered' by their txtime (deadline)
and will be dequeued following the earliest txtime first.

The qdisc will drop any packets with a Tx time in the past, or if a
packet expires while waiting for being dequeued. Drops can be reported
as errors back to userspace through the socket's error queue.

Example configuration:

$ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \
            clockid CLOCK_TAI

Here, the Qdisc will use HW offload for the txtime control.
Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before
their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by hrtimers, the system clock
and the PHC clock must be synchronized for this mode to behave as expected.

A more complete example can be found here, with instructions of how to
test it:

https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2]


Note that we haven't modified the qdisc so it uses a timerqueue because
the modification needed was increasing the number of cachelines of a sk_buff.



SO_TXTIME
=========

Changes since the RFC v3:
  - skb->tstamp is now cleared in skb_scrub_packet();
  - transmit time is now set for other send paths, and not only the
    "fast" ones as before;
  - removed the per-packet parameters (clockid and drop_if_late).
    Now just the skb->tstamp is used;
  - flags and clockid_t are now set per-socket as a parameter of
    SO_TXTIME.



This series is also hosted on github and can be found at [3].
The companion iproute2 patches can be found at [4].


[1] https://patchwork.ozlabs.org/cover/882342/

[2] github doesn't make it clear, but the gist can be cloned like this:
$ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f scheduled-tx-tests

[3] https://github.com/jeez/linux/tree/etf-v1

[4] https://github.com/jeez/iproute2/tree/etf-v1



Jesus Sanchez-Palencia (10):
  net: Clear skb->tstamp only on the forwarding path
  net: ipv4: Hook into time based transmission
  net/sched: Add HW offloading capability to ETF
  igb: Refactor igb_configure_cbs()
  igb: Only change Tx arbitration when CBS is on
  igb: Refactor igb_offload_cbs()
  igb: Add support for ETF offload
  igb: Only call skb_tx_timestamp after descriptors are ready
  net/sched: Enforce usage of CLOCK_TAI for sch_etf
  net/sched: Make etf report drops on error_queue

Richard Cochran (2):
  net: Add a new socket option for a future transmit time.
  net: packet: Hook into time based transmission.

Vinicius Costa Gomes (2):
  net/sched: Allow creating a Qdisc watchdog with other clocks
  net/sched: Introduce the ETF Qdisc

 arch/alpha/include/uapi/asm/socket.h          |   3 +
 arch/ia64/include/uapi/asm/socket.h           |   3 +
 arch/mips/include/uapi/asm/socket.h           |   3 +
 arch/parisc/include/uapi/asm/socket.h         |   3 +
 arch/s390/include/uapi/asm/socket.h           |   3 +
 arch/sparc/include/uapi/asm/socket.h          |   3 +
 arch/xtensa/include/uapi/asm/socket.h         |   3 +
 .../net/ethernet/intel/igb/e1000_defines.h    |  16 +
 drivers/net/ethernet/intel/igb/igb.h          |   1 +
 drivers/net/ethernet/intel/igb/igb_main.c     | 257 ++++++---
 include/linux/netdevice.h                     |   1 +
 include/linux/socket.h                        |   7 +
 include/net/inet_sock.h                       |   1 +
 include/net/pkt_sched.h                       |   7 +
 include/net/sock.h                            |   9 +
 include/uapi/asm-generic/socket.h             |   3 +
 include/uapi/linux/errqueue.h                 |   2 +
 include/uapi/linux/pkt_sched.h                |  18 +
 net/core/skbuff.c                             |   2 +-
 net/core/sock.c                               |  32 ++
 net/ipv4/ip_output.c                          |   3 +
 net/ipv4/raw.c                                |   2 +
 net/ipv4/udp.c                                |   1 +
 net/packet/af_packet.c                        |   6 +
 net/sched/Kconfig                             |  11 +
 net/sched/Makefile                            |   1 +
 net/sched/sch_api.c                           |  11 +-
 net/sched/sch_etf.c                           | 487 ++++++++++++++++++
 28 files changed, 829 insertions(+), 70 deletions(-)
 create mode 100644 net/sched/sch_etf.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 01/14] net: Clear skb->tstamp only on the forwarding path
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

This is done in preparation for the upcoming time based transmission
patchset. Now that skb->tstamp will be used to hold packet's txtime,
we must ensure that it is being cleared when traversing namespaces.
Also, doing that from skb_scrub_packet() before the early return would
break our feature when tunnels are used.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b1f274f22d85..236802b35203 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4898,7 +4898,6 @@ EXPORT_SYMBOL(skb_try_coalesce);
  */
 void skb_scrub_packet(struct sk_buff *skb, bool xnet)
 {
-	skb->tstamp = 0;
 	skb->pkt_type = PACKET_HOST;
 	skb->skb_iif = 0;
 	skb->ignore_df = 0;
@@ -4913,6 +4912,7 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
 	ipvs_reset(skb);
 	skb_orphan(skb);
 	skb->mark = 0;
+	skb->tstamp = 0;
 }
 EXPORT_SYMBOL_GPL(skb_scrub_packet);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 01/14] net: Clear skb->tstamp only on the forwarding path Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 22:16   ` Eric Dumazet
                     ` (2 more replies)
  2018-06-27 21:59 ` [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission Jesus Sanchez-Palencia
                   ` (11 subsequent siblings)
  13 siblings, 3 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri, Richard Cochran,
	Jesus Sanchez-Palencia

From: Richard Cochran <rcochran@linutronix.de>

This patch introduces SO_TXTIME. User space enables this option in
order to pass a desired future transmit time in a CMSG when calling
sendmsg(2). The argument to this socket option is a 6-bytes long struct
defined as:

struct sock_txtime {
	clockid_t 	clockid;
	u16		flags;
};

Note that two new fields were added to struct sock by filling a 4-bytes
hole found in the struct. For that reason, neither the struct size or
number of cachelines were altered.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  3 +++
 arch/ia64/include/uapi/asm/socket.h   |  3 +++
 arch/mips/include/uapi/asm/socket.h   |  3 +++
 arch/parisc/include/uapi/asm/socket.h |  3 +++
 arch/s390/include/uapi/asm/socket.h   |  3 +++
 arch/sparc/include/uapi/asm/socket.h  |  3 +++
 arch/xtensa/include/uapi/asm/socket.h |  3 +++
 include/linux/socket.h                |  5 +++++
 include/net/sock.h                    |  8 +++++++
 include/uapi/asm-generic/socket.h     |  3 +++
 net/core/sock.c                       | 32 +++++++++++++++++++++++++++
 11 files changed, 69 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index be14f16149d5..065fb372e355 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -112,4 +112,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 3efba40adc54..c872c4e6bafb 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -114,4 +114,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 49c3d4795963..71370fb3ceef 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -123,4 +123,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 1d0fdc3b5d22..061b9cf2a779 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -104,4 +104,7 @@
 
 #define SO_ZEROCOPY		0x4035
 
+#define SO_TXTIME		0x4036
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 3510c0fd06f4..39d901476ee5 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -111,4 +111,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index d58520c2e6ff..7ea35e5601b6 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -101,6 +101,9 @@
 
 #define SO_ZEROCOPY		0x003e
 
+#define SO_TXTIME		0x003f
+#define SCM_TXTIME		SO_TXTIME
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 75a07b8119a9..1de07a7f7680 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -116,4 +116,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 7ed4713d5337..ca476b7a8ff0 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -83,6 +83,11 @@ struct cmsghdr {
         int		cmsg_type;	/* protocol-specific type */
 };
 
+struct sock_txtime {
+	clockid_t	clockid;	/* reference clockid */
+	u16		flags;		/* bit 0: txtime in deadline_mode */
+};
+
 /*
  *	Ancillary data object information MACROS
  *	Table 5-14 of POSIX 1003.1g
diff --git a/include/net/sock.h b/include/net/sock.h
index b3b75419eafe..73f4404e49e4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -315,6 +315,7 @@ struct sock_common {
   *	@sk_destruct: called at sock freeing time, i.e. when all refcnt == 0
   *	@sk_reuseport_cb: reuseport group container
   *	@sk_rcu: used during RCU grace period
+  *	@sk_txtime: used by time-based scheduling
   */
 struct sock {
 	/*
@@ -468,6 +469,11 @@ struct sock {
 	u8			sk_shutdown;
 	u32			sk_tskey;
 	atomic_t		sk_zckey;
+
+	u16			sk_clockid;
+	u16			sk_txtime_flags;
+#define SK_TXTIME_DEADLINE_MASK	BIT(0)
+
 	struct socket		*sk_socket;
 	void			*sk_user_data;
 #ifdef CONFIG_SECURITY
@@ -783,6 +789,7 @@ enum sock_flags {
 	SOCK_FILTER_LOCKED, /* Filter cannot be changed anymore */
 	SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */
 	SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */
+	SOCK_TXTIME,
 };
 
 #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
@@ -1578,6 +1585,7 @@ void sock_kzfree_s(struct sock *sk, void *mem, int size);
 void sk_send_sigurg(struct sock *sk);
 
 struct sockcm_cookie {
+	u64 transmit_time;
 	u32 mark;
 	u16 tsflags;
 };
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 0ae758c90e54..a12692e5f7a8 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -107,4 +107,7 @@
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index bcc41829a16d..ef934d808941 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -91,6 +91,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <asm/unaligned.h>
 #include <linux/capability.h>
 #include <linux/errno.h>
 #include <linux/errqueue.h>
@@ -697,6 +698,7 @@ EXPORT_SYMBOL(sk_mc_loop);
 int sock_setsockopt(struct socket *sock, int level, int optname,
 		    char __user *optval, unsigned int optlen)
 {
+	struct sock_txtime sk_txtime;
 	struct sock *sk = sock->sk;
 	int val;
 	int valbool;
@@ -1070,6 +1072,22 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 		}
 		break;
 
+	case SO_TXTIME:
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
+			ret = -EPERM;
+		} else if (optlen != sizeof(struct sock_txtime)) {
+			ret = -EINVAL;
+		} else if (copy_from_user(&sk_txtime, optval,
+			   sizeof(struct sock_txtime))) {
+			ret = -EFAULT;
+			sock_valbool_flag(sk, SOCK_TXTIME, false);
+		} else {
+			sock_valbool_flag(sk, SOCK_TXTIME, true);
+			sk->sk_clockid = sk_txtime.clockid;
+			sk->sk_txtime_flags = sk_txtime.flags;
+		}
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1115,6 +1133,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		u64 val64;
 		struct linger ling;
 		struct timeval tm;
+		struct sock_txtime txtime;
 	} v;
 
 	int lv = sizeof(int);
@@ -1403,6 +1422,12 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sock_flag(sk, SOCK_ZEROCOPY);
 		break;
 
+	case SO_TXTIME:
+		lv = sizeof(v.txtime);
+		v.txtime.clockid = sk->sk_clockid;
+		v.txtime.flags = sk->sk_txtime_flags;
+		break;
+
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).
@@ -2137,6 +2162,13 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 		sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
 		sockc->tsflags |= tsflags;
 		break;
+	case SCM_TXTIME:
+		if (!sock_flag(sk, SOCK_TXTIME))
+			return -EINVAL;
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
+			return -EINVAL;
+		sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
+		break;
 	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 01/14] net: Clear skb->tstamp only on the forwarding path Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-28 14:26   ` Willem de Bruijn
  2018-06-27 21:59 ` [PATCH v1 net-next 04/14] net: packet: " Jesus Sanchez-Palencia
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri, Richard Cochran,
	Jesus Sanchez-Palencia

Add a transmit_time field to struct inet_cork, then copy the
timestamp from the CMSG cookie at ip_setup_cork() so we can
safely copy it into the skb later during __ip_make_skb().

For the raw fast path, just perform the copy at raw_send_hdrinc().

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 include/net/inet_sock.h | 1 +
 net/ipv4/ip_output.c    | 3 +++
 net/ipv4/raw.c          | 2 ++
 net/ipv4/udp.c          | 1 +
 4 files changed, 7 insertions(+)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 83d5b3c2ac42..314be484c696 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -148,6 +148,7 @@ struct inet_cork {
 	__s16			tos;
 	char			priority;
 	__u16			gso_size;
+	u64			transmit_time;
 };
 
 struct inet_cork_full {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b3308e9d9762..904a54a090e9 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1153,6 +1153,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	cork->tos = ipc->tos;
 	cork->priority = ipc->priority;
 	cork->tx_flags = ipc->tx_flags;
+	cork->transmit_time = ipc->sockc.transmit_time;
 
 	return 0;
 }
@@ -1413,6 +1414,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 
 	skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
 	skb->mark = sk->sk_mark;
+	skb->tstamp = cork->transmit_time;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
 	 * on dst refcount
@@ -1495,6 +1497,7 @@ struct sk_buff *ip_make_skb(struct sock *sk,
 	cork->flags = 0;
 	cork->addr = 0;
 	cork->opt = NULL;
+	cork->transmit_time = 0;
 	err = ip_setup_cork(sk, cork, ipc, rtp);
 	if (err)
 		return ERR_PTR(err);
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index abb3c9490c55..446af7be2b55 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -381,6 +381,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
+	skb->tstamp = sockc->transmit_time;
 	skb_dst_set(skb, &rt->dst);
 	*rtp = NULL;
 
@@ -562,6 +563,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	ipc.sockc.tsflags = sk->sk_tsflags;
+	ipc.sockc.transmit_time = 0;
 	ipc.addr = inet->inet_saddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9bb27df4dac5..0ab2c13bc7a1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -978,6 +978,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	ipc.sockc.tsflags = sk->sk_tsflags;
+	ipc.sockc.transmit_time = 0;
 	ipc.addr = inet->inet_saddr;
 	ipc.oif = sk->sk_bound_dev_if;
 	ipc.gso_size = up->gso_size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 04/14] net: packet: Hook into time based transmission.
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (2 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 05/14] net/sched: Allow creating a Qdisc watchdog with other clocks Jesus Sanchez-Palencia
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri, Richard Cochran,
	Jesus Sanchez-Palencia

From: Richard Cochran <rcochran@linutronix.de>

For raw layer-2 packets, copy the desired future transmit time from
the CMSG cookie into the skb.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 net/packet/af_packet.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ff8e7e245c37..255c0164e0aa 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1951,6 +1951,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 		goto out_unlock;
 	}
 
+	sockc.transmit_time = 0;
 	sockc.tsflags = sk->sk_tsflags;
 	if (msg->msg_controllen) {
 		err = sock_cmsg_send(sk, msg, &sockc);
@@ -1962,6 +1963,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
 	skb->mark = sk->sk_mark;
+	skb->tstamp = sockc.transmit_time;
 
 	sock_tx_timestamp(sk, sockc.tsflags, &skb_shinfo(skb)->tx_flags);
 
@@ -2457,6 +2459,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->dev = dev;
 	skb->priority = po->sk.sk_priority;
 	skb->mark = po->sk.sk_mark;
+	skb->tstamp = sockc->transmit_time;
 	sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
 	skb_shinfo(skb)->destructor_arg = ph.raw;
 
@@ -2633,6 +2636,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto out_put;
 
+	sockc.transmit_time = 0;
 	sockc.tsflags = po->sk.sk_tsflags;
 	if (msg->msg_controllen) {
 		err = sock_cmsg_send(&po->sk, msg, &sockc);
@@ -2829,6 +2833,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	if (unlikely(!(dev->flags & IFF_UP)))
 		goto out_unlock;
 
+	sockc.transmit_time = 0;
 	sockc.tsflags = sk->sk_tsflags;
 	sockc.mark = sk->sk_mark;
 	if (msg->msg_controllen) {
@@ -2903,6 +2908,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->dev = dev;
 	skb->priority = sk->sk_priority;
 	skb->mark = sockc.mark;
+	skb->tstamp = sockc.transmit_time;
 
 	if (has_vnet_hdr) {
 		err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le());
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 05/14] net/sched: Allow creating a Qdisc watchdog with other clocks
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (3 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 04/14] net: packet: " Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 06/14] net/sched: Introduce the ETF Qdisc Jesus Sanchez-Palencia
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be
passed, this allows other time references to be used when scheduling
the Qdisc to run.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/net/pkt_sched.h |  2 ++
 net/sched/sch_api.c     | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 815b92a23936..2466ea143d01 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -72,6 +72,8 @@ struct qdisc_watchdog {
 	struct Qdisc	*qdisc;
 };
 
+void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc,
+				 clockid_t clockid);
 void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc);
 void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 54eca685420f..98541c6399db 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -596,12 +596,19 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc)
+void qdisc_watchdog_init_clockid(struct qdisc_watchdog *wd, struct Qdisc *qdisc,
+				 clockid_t clockid)
 {
-	hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
+	hrtimer_init(&wd->timer, clockid, HRTIMER_MODE_ABS_PINNED);
 	wd->timer.function = qdisc_watchdog;
 	wd->qdisc = qdisc;
 }
+EXPORT_SYMBOL(qdisc_watchdog_init_clockid);
+
+void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc)
+{
+	qdisc_watchdog_init_clockid(wd, qdisc, CLOCK_MONOTONIC);
+}
 EXPORT_SYMBOL(qdisc_watchdog_init);
 
 void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 06/14] net/sched: Introduce the ETF Qdisc
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (4 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 05/14] net/sched: Allow creating a Qdisc watchdog with other clocks Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 07/14] net/sched: Add HW offloading capability to ETF Jesus Sanchez-Palencia
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>

The ETF (Earliest TxTime First) qdisc uses the information added
earlier in this series (the socket option SO_TXTIME and the new
role of sk_buff->tstamp) to schedule packets transmission based
on absolute time.

For some workloads, just bandwidth enforcement is not enough, and
precise control of the transmission of packets is necessary.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
           clockid CLOCK_REALTIME

In this example, the Qdisc will provide SW best-effort for the control
of the transmission time to the network adapter, the time stamp in the
socket will be in reference to the clockid CLOCK_REALTIME and packets
will leave the qdisc "delta" (100000) nanoseconds before its transmission
time.

The ETF qdisc will buffer packets sorted by their txtime. It will drop
packets on enqueue() if their skbuff clockid does not match the clock
reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
if it expires while being enqueued.

The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
will dequeue a packet as soon as possible and change the skb timestamp
to 'now' during etf_dequeue().

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h      |   1 +
 include/uapi/linux/pkt_sched.h |  17 ++
 net/sched/Kconfig              |  11 +
 net/sched/Makefile             |   1 +
 net/sched/sch_etf.c            | 385 +++++++++++++++++++++++++++++++++
 5 files changed, 415 insertions(+)
 create mode 100644 net/sched/sch_etf.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c6b377a15869..7f650bdc6ec3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -793,6 +793,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_RED,
 	TC_SETUP_QDISC_PRIO,
 	TC_SETUP_QDISC_MQ,
+	TC_SETUP_QDISC_ETF,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096ae97b..9d6fd2004a03 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -934,4 +934,21 @@ enum {
 
 #define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
 
+
+/* ETF */
+struct tc_etf_qopt {
+	__s32 delta;
+	__s32 clockid;
+	__u32 flags;
+#define TC_ETF_DEADLINE_MODE_ON	BIT(0)
+};
+
+enum {
+	TCA_ETF_UNSPEC,
+	TCA_ETF_PARMS,
+	__TCA_ETF_MAX,
+};
+
+#define TCA_ETF_MAX (__TCA_ETF_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a01169fb5325..fcc89706745b 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -183,6 +183,17 @@ config NET_SCH_CBS
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_cbs.
 
+config NET_SCH_ETF
+	tristate "Earliest TxTime First (ETF)"
+	help
+	  Say Y here if you want to use the Earliest TxTime First (ETF) packet
+	  scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_etf.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_etf.
+
 config NET_SCH_GRED
 	tristate "Generic Random Early Detection (GRED)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8811d3804878..9a5a7077d217 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_NET_SCH_FQ)	+= sch_fq.o
 obj-$(CONFIG_NET_SCH_HHF)	+= sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)	+= sch_pie.o
 obj-$(CONFIG_NET_SCH_CBS)	+= sch_cbs.o
+obj-$(CONFIG_NET_SCH_ETF)	+= sch_etf.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
new file mode 100644
index 000000000000..5f01a285f399
--- /dev/null
+++ b/net/sched/sch_etf.c
@@ -0,0 +1,385 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* net/sched/sch_etf.c  Earliest TxTime First queueing discipline.
+ *
+ * Authors:	Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
+ *		Vinicius Costa Gomes <vinicius.gomes@intel.com>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/rbtree.h>
+#include <linux/skbuff.h>
+#include <linux/posix-timers.h>
+#include <net/netlink.h>
+#include <net/sch_generic.h>
+#include <net/pkt_sched.h>
+#include <net/sock.h>
+
+#define DEADLINE_MODE_IS_ON(x) ((x)->flags & TC_ETF_DEADLINE_MODE_ON)
+
+struct etf_sched_data {
+	bool deadline_mode;
+	int clockid;
+	int queue;
+	s32 delta; /* in ns */
+	ktime_t last; /* The txtime of the last skb sent to the netdevice. */
+	struct rb_root head;
+	struct qdisc_watchdog watchdog;
+	ktime_t (*get_time)(void);
+};
+
+static const struct nla_policy etf_policy[TCA_ETF_MAX + 1] = {
+	[TCA_ETF_PARMS]	= { .len = sizeof(struct tc_etf_qopt) },
+};
+
+static inline int validate_input_params(struct tc_etf_qopt *qopt,
+					struct netlink_ext_ack *extack)
+{
+	/* Check if params comply to the following rules:
+	 *	* Clockid and delta must be valid.
+	 *
+	 *	* Dynamic clockids are not supported.
+	 *
+	 *	* Delta must be a positive integer.
+	 */
+	if (qopt->clockid < 0) {
+		NL_SET_ERR_MSG(extack, "Dynamic clockids are not supported");
+		return -ENOTSUPP;
+	}
+
+	if (qopt->clockid >= MAX_CLOCKS) {
+		NL_SET_ERR_MSG(extack, "Invalid clockid");
+		return -EINVAL;
+	}
+
+	if (qopt->delta < 0) {
+		NL_SET_ERR_MSG(extack, "Delta must be positive");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	ktime_t txtime = nskb->tstamp;
+	struct sock *sk = nskb->sk;
+	ktime_t now;
+
+	if (!sk)
+		return false;
+
+	if (!sock_flag(sk, SOCK_TXTIME))
+		return false;
+
+	/* We don't perform crosstimestamping.
+	 * Drop if packet's clockid differs from qdisc's.
+	 */
+	if (sk->sk_clockid != q->clockid)
+		return false;
+
+	if ((sk->sk_txtime_flags & SK_TXTIME_DEADLINE_MASK) !=
+	    q->deadline_mode)
+		return false;
+
+	now = q->get_time();
+	if (ktime_before(txtime, now) || ktime_before(txtime, q->last))
+		return false;
+
+	return true;
+}
+
+static struct sk_buff *etf_peek_timesortedlist(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct rb_node *p;
+
+	p = rb_first(&q->head);
+	if (!p)
+		return NULL;
+
+	return rb_to_skb(p);
+}
+
+static void reset_watchdog(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = etf_peek_timesortedlist(sch);
+	ktime_t next;
+
+	if (!skb)
+		return;
+
+	next = ktime_sub_ns(skb->tstamp, q->delta);
+	qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
+}
+
+static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
+				      struct sk_buff **to_free)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct rb_node **p = &q->head.rb_node, *parent = NULL;
+	ktime_t txtime = nskb->tstamp;
+
+	if (!is_packet_valid(sch, nskb))
+		return qdisc_drop(nskb, sch, to_free);
+
+	while (*p) {
+		struct sk_buff *skb;
+
+		parent = *p;
+		skb = rb_to_skb(parent);
+		if (ktime_after(txtime, skb->tstamp))
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&nskb->rbnode, parent, p);
+	rb_insert_color(&nskb->rbnode, &q->head);
+
+	qdisc_qstats_backlog_inc(sch, nskb);
+	sch->q.qlen++;
+
+	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
+	reset_watchdog(sch);
+
+	return NET_XMIT_SUCCESS;
+}
+
+static void timesortedlist_erase(struct Qdisc *sch, struct sk_buff *skb,
+				 bool drop)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+
+	rb_erase(&skb->rbnode, &q->head);
+
+	/* The rbnode field in the skb re-uses these fields, now that
+	 * we are done with the rbnode, reset them.
+	 */
+	skb->next = NULL;
+	skb->prev = NULL;
+	skb->dev = qdisc_dev(sch);
+
+	qdisc_qstats_backlog_dec(sch, skb);
+
+	if (drop) {
+		struct sk_buff *to_free = NULL;
+
+		qdisc_drop(skb, sch, &to_free);
+		kfree_skb_list(to_free);
+		qdisc_qstats_overlimit(sch);
+	} else {
+		qdisc_bstats_update(sch, skb);
+
+		q->last = skb->tstamp;
+	}
+
+	sch->q.qlen--;
+}
+
+static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	ktime_t now, next;
+
+	skb = etf_peek_timesortedlist(sch);
+	if (!skb)
+		return NULL;
+
+	now = q->get_time();
+
+	/* Drop if packet has expired while in queue. */
+	/* FIXME: Must return error on the socket's error queue */
+	if (ktime_before(skb->tstamp, now)) {
+		timesortedlist_erase(sch, skb, true);
+		skb = NULL;
+		goto out;
+	}
+
+	/* When in deadline mode, dequeue as soon as possible and change the
+	 * txtime from deadline to (now + delta).
+	 */
+	if (q->deadline_mode) {
+		timesortedlist_erase(sch, skb, false);
+		skb->tstamp = now;
+		goto out;
+	}
+
+	next = ktime_sub_ns(skb->tstamp, q->delta);
+
+	/* Dequeue only if now is within the [txtime - delta, txtime] range. */
+	if (ktime_after(now, next))
+		timesortedlist_erase(sch, skb, false);
+	else
+		skb = NULL;
+
+out:
+	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
+	reset_watchdog(sch);
+
+	return skb;
+}
+
+static int etf_init(struct Qdisc *sch, struct nlattr *opt,
+		    struct netlink_ext_ack *extack)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct nlattr *tb[TCA_ETF_MAX + 1];
+	struct tc_etf_qopt *qopt;
+	int err;
+
+	if (!opt) {
+		NL_SET_ERR_MSG(extack,
+			       "Missing ETF qdisc options which are mandatory");
+		return -EINVAL;
+	}
+
+	err = nla_parse_nested(tb, TCA_ETF_MAX, opt, etf_policy, extack);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_ETF_PARMS]) {
+		NL_SET_ERR_MSG(extack, "Missing mandatory ETF parameters");
+		return -EINVAL;
+	}
+
+	qopt = nla_data(tb[TCA_ETF_PARMS]);
+
+	pr_debug("delta %d clockid %d deadline %s\n",
+		 qopt->delta, qopt->clockid,
+		 DEADLINE_MODE_IS_ON(qopt) ? "on" : "off");
+
+	err = validate_input_params(qopt, extack);
+	if (err < 0)
+		return err;
+
+	q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
+
+	/* Everything went OK, save the parameters used. */
+	q->delta = qopt->delta;
+	q->clockid = qopt->clockid;
+	q->deadline_mode = DEADLINE_MODE_IS_ON(qopt);
+
+	switch (q->clockid) {
+	case CLOCK_REALTIME:
+		q->get_time = ktime_get_real;
+		break;
+	case CLOCK_MONOTONIC:
+		q->get_time = ktime_get;
+		break;
+	case CLOCK_BOOTTIME:
+		q->get_time = ktime_get_boottime;
+		break;
+	case CLOCK_TAI:
+		q->get_time = ktime_get_clocktai;
+		break;
+	default:
+		NL_SET_ERR_MSG(extack, "Clockid is not supported");
+		return -ENOTSUPP;
+	}
+
+	qdisc_watchdog_init_clockid(&q->watchdog, sch, q->clockid);
+
+	return 0;
+}
+
+static void timesortedlist_clear(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct rb_node *p = rb_first(&q->head);
+
+	while (p) {
+		struct sk_buff *skb = rb_to_skb(p);
+
+		p = rb_next(p);
+
+		rb_erase(&skb->rbnode, &q->head);
+		rtnl_kfree_skbs(skb, skb);
+		sch->q.qlen--;
+	}
+}
+
+static void etf_reset(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+
+	/* Only cancel watchdog if it's been initialized. */
+	if (q->watchdog.qdisc == sch)
+		qdisc_watchdog_cancel(&q->watchdog);
+
+	/* No matter which mode we are on, it's safe to clear both lists. */
+	timesortedlist_clear(sch);
+	__qdisc_reset_queue(&sch->q);
+
+	sch->qstats.backlog = 0;
+	sch->q.qlen = 0;
+
+	q->last = 0;
+}
+
+static void etf_destroy(struct Qdisc *sch)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+
+	/* Only cancel watchdog if it's been initialized. */
+	if (q->watchdog.qdisc == sch)
+		qdisc_watchdog_cancel(&q->watchdog);
+}
+
+static int etf_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct etf_sched_data *q = qdisc_priv(sch);
+	struct tc_etf_qopt opt = { };
+	struct nlattr *nest;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+
+	opt.delta = q->delta;
+	opt.clockid = q->clockid;
+	if (q->deadline_mode)
+		opt.flags |= TC_ETF_DEADLINE_MODE_ON;
+
+	if (nla_put(skb, TCA_ETF_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, nest);
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static struct Qdisc_ops etf_qdisc_ops __read_mostly = {
+	.id		=	"etf",
+	.priv_size	=	sizeof(struct etf_sched_data),
+	.enqueue	=	etf_enqueue_timesortedlist,
+	.dequeue	=	etf_dequeue_timesortedlist,
+	.peek		=	etf_peek_timesortedlist,
+	.init		=	etf_init,
+	.reset		=	etf_reset,
+	.destroy	=	etf_destroy,
+	.dump		=	etf_dump,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init etf_module_init(void)
+{
+	return register_qdisc(&etf_qdisc_ops);
+}
+
+static void __exit etf_module_exit(void)
+{
+	unregister_qdisc(&etf_qdisc_ops);
+}
+module_init(etf_module_init)
+module_exit(etf_module_exit)
+MODULE_LICENSE("GPL");
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 07/14] net/sched: Add HW offloading capability to ETF
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (5 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 06/14] net/sched: Introduce the ETF Qdisc Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 08/14] igb: Refactor igb_configure_cbs() Jesus Sanchez-Palencia
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Add infra so etf qdisc supports HW offload of time-based transmission.

For hw offload, the time sorted list is still used, so packets are
dequeued always in order of txtime.

Example:

$ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
           map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

$ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \
	   clockid CLOCK_REALTIME

In this example, the Qdisc will use HW offload for the control of the
transmission time through the network adapter. The hrtimer used for
packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME
as reference and packets leave the Qdisc "delta" (100000) nanoseconds
before their transmission time. Because this will be using HW offload and
since dynamic clocks are not supported by the hrtimer, the system clock
and the PHC clock must be synchronized for this mode to behave as
expected.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 include/net/pkt_sched.h        |  5 +++
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_etf.c            | 71 +++++++++++++++++++++++++++++++++-
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2466ea143d01..7dc769e5452b 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -155,4 +155,9 @@ struct tc_cbs_qopt_offload {
 	s32 sendslope;
 };
 
+struct tc_etf_qopt_offload {
+	u8 enable;
+	s32 queue;
+};
+
 #endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 9d6fd2004a03..efad482e69d2 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -941,6 +941,7 @@ struct tc_etf_qopt {
 	__s32 clockid;
 	__u32 flags;
 #define TC_ETF_DEADLINE_MODE_ON	BIT(0)
+#define TC_ETF_OFFLOAD_ON	BIT(1)
 };
 
 enum {
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 5f01a285f399..cd6cb5b69228 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -20,8 +20,10 @@
 #include <net/sock.h>
 
 #define DEADLINE_MODE_IS_ON(x) ((x)->flags & TC_ETF_DEADLINE_MODE_ON)
+#define OFFLOAD_IS_ON(x) ((x)->flags & TC_ETF_OFFLOAD_ON)
 
 struct etf_sched_data {
+	bool offload;
 	bool deadline_mode;
 	int clockid;
 	int queue;
@@ -45,6 +47,9 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
 	 *	* Dynamic clockids are not supported.
 	 *
 	 *	* Delta must be a positive integer.
+	 *
+	 * Also note that for the HW offload case, we must
+	 * expect that system clocks have been synchronized to PHC.
 	 */
 	if (qopt->clockid < 0) {
 		NL_SET_ERR_MSG(extack, "Dynamic clockids are not supported");
@@ -226,6 +231,56 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 	return skb;
 }
 
+static void etf_disable_offload(struct net_device *dev,
+				struct etf_sched_data *q)
+{
+	struct tc_etf_qopt_offload etf = { };
+	const struct net_device_ops *ops;
+	int err;
+
+	if (!q->offload)
+		return;
+
+	ops = dev->netdev_ops;
+	if (!ops->ndo_setup_tc)
+		return;
+
+	etf.queue = q->queue;
+	etf.enable = 0;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_ETF, &etf);
+	if (err < 0)
+		pr_warn("Couldn't disable ETF offload for queue %d\n",
+			etf.queue);
+}
+
+static int etf_enable_offload(struct net_device *dev, struct etf_sched_data *q,
+			      struct netlink_ext_ack *extack)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_etf_qopt_offload etf = { };
+	int err;
+
+	if (q->offload)
+		return 0;
+
+	if (!ops->ndo_setup_tc) {
+		NL_SET_ERR_MSG(extack, "Specified device does not support ETF offload");
+		return -EOPNOTSUPP;
+	}
+
+	etf.queue = q->queue;
+	etf.enable = 1;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_ETF, &etf);
+	if (err < 0) {
+		NL_SET_ERR_MSG(extack, "Specified device failed to setup ETF hardware offload");
+		return err;
+	}
+
+	return 0;
+}
+
 static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 		    struct netlink_ext_ack *extack)
 {
@@ -252,8 +307,9 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 
 	qopt = nla_data(tb[TCA_ETF_PARMS]);
 
-	pr_debug("delta %d clockid %d deadline %s\n",
+	pr_debug("delta %d clockid %d offload %s deadline %s\n",
 		 qopt->delta, qopt->clockid,
+		 OFFLOAD_IS_ON(qopt) ? "on" : "off",
 		 DEADLINE_MODE_IS_ON(qopt) ? "on" : "off");
 
 	err = validate_input_params(qopt, extack);
@@ -262,9 +318,16 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 
 	q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
 
+	if (OFFLOAD_IS_ON(qopt)) {
+		err = etf_enable_offload(dev, q, extack);
+		if (err < 0)
+			return err;
+	}
+
 	/* Everything went OK, save the parameters used. */
 	q->delta = qopt->delta;
 	q->clockid = qopt->clockid;
+	q->offload = OFFLOAD_IS_ON(qopt);
 	q->deadline_mode = DEADLINE_MODE_IS_ON(qopt);
 
 	switch (q->clockid) {
@@ -327,10 +390,13 @@ static void etf_reset(struct Qdisc *sch)
 static void etf_destroy(struct Qdisc *sch)
 {
 	struct etf_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
 	/* Only cancel watchdog if it's been initialized. */
 	if (q->watchdog.qdisc == sch)
 		qdisc_watchdog_cancel(&q->watchdog);
+
+	etf_disable_offload(dev, q);
 }
 
 static int etf_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -345,6 +411,9 @@ static int etf_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	opt.delta = q->delta;
 	opt.clockid = q->clockid;
+	if (q->offload)
+		opt.flags |= TC_ETF_OFFLOAD_ON;
+
 	if (q->deadline_mode)
 		opt.flags |= TC_ETF_DEADLINE_MODE_ON;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 08/14] igb: Refactor igb_configure_cbs()
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (6 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 07/14] net/sched: Add HW offloading capability to ETF Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 09/14] igb: Only change Tx arbitration when CBS is on Jesus Sanchez-Palencia
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Make this function retrieve what it needs from the Tx ring being
addressed since it already relies on what had been saved on it before.
Also, since this function will be used by the upcoming Launchtime
patches rename it to better reflect its intention. Note that
Launchtime is not part of what 802.1Qav specifies, but the i210
datasheet refers to this set of functionality as "Qav Transmission
Mode".

Here we also perform a tiny refactor at is_any_cbs_enabled(), and add
further documentation to igb_setup_tx_mode().

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 60 +++++++++++------------
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index f1e3397bd405..15f6b9c57ccf 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1655,23 +1655,17 @@ static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
 }
 
 /**
- *  igb_configure_cbs - Configure Credit-Based Shaper (CBS)
+ *  igb_config_tx_modes - Configure "Qav Tx mode" features on igb
  *  @adapter: pointer to adapter struct
  *  @queue: queue number
- *  @enable: true = enable CBS, false = disable CBS
- *  @idleslope: idleSlope in kbps
- *  @sendslope: sendSlope in kbps
- *  @hicredit: hiCredit in bytes
- *  @locredit: loCredit in bytes
  *
- *  Configure CBS for a given hardware queue. When disabling, idleslope,
- *  sendslope, hicredit, locredit arguments are ignored. Returns 0 if
- *  success. Negative otherwise.
+ *  Configure CBS for a given hardware queue. Parameters are retrieved
+ *  from the correct Tx ring, so igb_save_cbs_params() should be used
+ *  for setting those correctly prior to this function being called.
  **/
-static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
-			      bool enable, int idleslope, int sendslope,
-			      int hicredit, int locredit)
+static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 {
+	struct igb_ring *ring = adapter->tx_ring[queue];
 	struct net_device *netdev = adapter->netdev;
 	struct e1000_hw *hw = &adapter->hw;
 	u32 tqavcc;
@@ -1680,7 +1674,7 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
 	WARN_ON(hw->mac.type != e1000_i210);
 	WARN_ON(queue < 0 || queue > 1);
 
-	if (enable || queue == 0) {
+	if (ring->cbs_enable || queue == 0) {
 		/* i210 does not allow the queue 0 to be in the Strict
 		 * Priority mode while the Qav mode is enabled, so,
 		 * instead of disabling strict priority mode, we give
@@ -1690,10 +1684,10 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
 		 * Queue0 QueueMode must be set to 1b when
 		 * TransmitMode is set to Qav."
 		 */
-		if (queue == 0 && !enable) {
+		if (queue == 0 && !ring->cbs_enable) {
 			/* max "linkspeed" idleslope in kbps */
-			idleslope = 1000000;
-			hicredit = ETH_FRAME_LEN;
+			ring->idleslope = 1000000;
+			ring->hicredit = ETH_FRAME_LEN;
 		}
 
 		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
@@ -1756,14 +1750,15 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
 		 *       calculated value, so the resulting bandwidth might
 		 *       be slightly higher for some configurations.
 		 */
-		value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000);
+		value = DIV_ROUND_UP_ULL(ring->idleslope * 61034ULL, 1000000);
 
 		tqavcc = rd32(E1000_I210_TQAVCC(queue));
 		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
 		tqavcc |= value;
 		wr32(E1000_I210_TQAVCC(queue), tqavcc);
 
-		wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735);
+		wr32(E1000_I210_TQAVHC(queue),
+		     0x80000000 + ring->hicredit * 0x7735);
 	} else {
 		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
 		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
@@ -1783,8 +1778,9 @@ static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
 	 */
 
 	netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n",
-		   (enable) ? "enabled" : "disabled", queue,
-		   idleslope, sendslope, hicredit, locredit);
+		   (ring->cbs_enable) ? "enabled" : "disabled", queue,
+		   ring->idleslope, ring->sendslope, ring->hicredit,
+		   ring->locredit);
 }
 
 static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
@@ -1809,19 +1805,25 @@ static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
 
 static bool is_any_cbs_enabled(struct igb_adapter *adapter)
 {
-	struct igb_ring *ring;
 	int i;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
-		ring = adapter->tx_ring[i];
-
-		if (ring->cbs_enable)
+		if (adapter->tx_ring[i]->cbs_enable)
 			return true;
 	}
 
 	return false;
 }
 
+/**
+ *  igb_setup_tx_mode - Switch to/from Qav Tx mode when applicable
+ *  @adapter: pointer to adapter struct
+ *
+ *  Configure TQAVCTRL register switching the controller's Tx mode
+ *  if FQTSS mode is enabled or disabled. Additionally, will issue
+ *  a call to igb_config_tx_modes() per queue so any previously saved
+ *  Tx parameters are applied.
+ **/
 static void igb_setup_tx_mode(struct igb_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
@@ -1881,11 +1883,7 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter)
 			    adapter->num_tx_queues : I210_SR_QUEUES_NUM;
 
 		for (i = 0; i < max_queue; i++) {
-			struct igb_ring *ring = adapter->tx_ring[i];
-
-			igb_configure_cbs(adapter, i, ring->cbs_enable,
-					  ring->idleslope, ring->sendslope,
-					  ring->hicredit, ring->locredit);
+			igb_config_tx_modes(adapter, i);
 		}
 	} else {
 		wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT);
@@ -2480,9 +2478,7 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
 		return err;
 
 	if (is_fqtss_enabled(adapter)) {
-		igb_configure_cbs(adapter, qopt->queue, qopt->enable,
-				  qopt->idleslope, qopt->sendslope,
-				  qopt->hicredit, qopt->locredit);
+		igb_config_tx_modes(adapter, qopt->queue);
 
 		if (!is_any_cbs_enabled(adapter))
 			enable_fqtss(adapter, false);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 09/14] igb: Only change Tx arbitration when CBS is on
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (7 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 08/14] igb: Refactor igb_configure_cbs() Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 10/14] igb: Refactor igb_offload_cbs() Jesus Sanchez-Palencia
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Currently the data transmission arbitration algorithm - DataTranARB
field on TQAVCTRL reg - is always set to CBS when the Tx mode is
changed from legacy to 'Qav' mode.

Make that configuration a bit more granular in preparation for the
upcoming Launchtime enabling patches, since CBS and Launchtime can be
enabled separately. That is achieved by moving the DataTranARB setup
to igb_config_tx_modes() instead.

Similarly, when disabling CBS we must check if it has been disabled
for all queues, and clear the DataTranARB accordingly.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 49 +++++++++++++++--------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 15f6b9c57ccf..8c90f1e51add 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1654,6 +1654,18 @@ static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
 	wr32(E1000_I210_TQAVCC(queue), val);
 }
 
+static bool is_any_cbs_enabled(struct igb_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		if (adapter->tx_ring[i]->cbs_enable)
+			return true;
+	}
+
+	return false;
+}
+
 /**
  *  igb_config_tx_modes - Configure "Qav Tx mode" features on igb
  *  @adapter: pointer to adapter struct
@@ -1668,7 +1680,7 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 	struct igb_ring *ring = adapter->tx_ring[queue];
 	struct net_device *netdev = adapter->netdev;
 	struct e1000_hw *hw = &adapter->hw;
-	u32 tqavcc;
+	u32 tqavcc, tqavctrl;
 	u16 value;
 
 	WARN_ON(hw->mac.type != e1000_i210);
@@ -1693,6 +1705,14 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
 		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
 
+		/* Always set data transfer arbitration to credit-based
+		 * shaper algorithm on TQAVCTRL if CBS is enabled for any of
+		 * the queues.
+		 */
+		tqavctrl = rd32(E1000_I210_TQAVCTRL);
+		tqavctrl |= E1000_TQAVCTRL_DATATRANARB;
+		wr32(E1000_I210_TQAVCTRL, tqavctrl);
+
 		/* According to i210 datasheet section 7.2.7.7, we should set
 		 * the 'idleSlope' field from TQAVCC register following the
 		 * equation:
@@ -1770,6 +1790,16 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 
 		/* Set hiCredit to zero. */
 		wr32(E1000_I210_TQAVHC(queue), 0);
+
+		/* If CBS is not enabled for any queues anymore, then return to
+		 * the default state of Data Transmission Arbitration on
+		 * TQAVCTRL.
+		 */
+		if (!is_any_cbs_enabled(adapter)) {
+			tqavctrl = rd32(E1000_I210_TQAVCTRL);
+			tqavctrl &= ~E1000_TQAVCTRL_DATATRANARB;
+			wr32(E1000_I210_TQAVCTRL, tqavctrl);
+		}
 	}
 
 	/* XXX: In i210 controller the sendSlope and loCredit parameters from
@@ -1803,18 +1833,6 @@ static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
 	return 0;
 }
 
-static bool is_any_cbs_enabled(struct igb_adapter *adapter)
-{
-	int i;
-
-	for (i = 0; i < adapter->num_tx_queues; i++) {
-		if (adapter->tx_ring[i]->cbs_enable)
-			return true;
-	}
-
-	return false;
-}
-
 /**
  *  igb_setup_tx_mode - Switch to/from Qav Tx mode when applicable
  *  @adapter: pointer to adapter struct
@@ -1838,11 +1856,10 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter)
 		int i, max_queue;
 
 		/* Configure TQAVCTRL register: set transmit mode to 'Qav',
-		 * set data fetch arbitration to 'round robin' and set data
-		 * transfer arbitration to 'credit shaper algorithm.
+		 * set data fetch arbitration to 'round robin'.
 		 */
 		val = rd32(E1000_I210_TQAVCTRL);
-		val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB;
+		val |= E1000_TQAVCTRL_XMIT_MODE;
 		val &= ~E1000_TQAVCTRL_DATAFETCHARB;
 		wr32(E1000_I210_TQAVCTRL, val);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 10/14] igb: Refactor igb_offload_cbs()
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (8 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 09/14] igb: Only change Tx arbitration when CBS is on Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 11/14] igb: Add support for ETF offload Jesus Sanchez-Palencia
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Split code into a separate function (igb_offload_apply()) that will be
used by ETF offload implementation.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8c90f1e51add..c30ab7b260cc 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2474,6 +2474,19 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	return features;
 }
 
+static void igb_offload_apply(struct igb_adapter *adapter, s32 queue)
+{
+	if (!is_fqtss_enabled(adapter)) {
+		enable_fqtss(adapter, true);
+		return;
+	}
+
+	igb_config_tx_modes(adapter, queue);
+
+	if (!is_any_cbs_enabled(adapter))
+		enable_fqtss(adapter, false);
+}
+
 static int igb_offload_cbs(struct igb_adapter *adapter,
 			   struct tc_cbs_qopt_offload *qopt)
 {
@@ -2494,15 +2507,7 @@ static int igb_offload_cbs(struct igb_adapter *adapter,
 	if (err)
 		return err;
 
-	if (is_fqtss_enabled(adapter)) {
-		igb_config_tx_modes(adapter, qopt->queue);
-
-		if (!is_any_cbs_enabled(adapter))
-			enable_fqtss(adapter, false);
-
-	} else {
-		enable_fqtss(adapter, true);
-	}
+	igb_offload_apply(adapter, qopt->queue);
 
 	return 0;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 11/14] igb: Add support for ETF offload
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (9 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 10/14] igb: Refactor igb_offload_cbs() Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready Jesus Sanchez-Palencia
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Implement HW offload support for SO_TXTIME through igb's Launchtime
feature. This is done by extending igb_setup_tc() so it supports
TC_SETUP_QDISC_ETF and configuring i210 so time based transmit
arbitration is enabled.

The FQTSS transmission mode added before is extended so strict
priority (SP) queues wait for stream reservation (SR) ones.
igb_config_tx_modes() is extended so it can support enabling/disabling
Launchtime following the previous approach used for the credit-based
shaper (CBS).

As the previous flow, FQTSS transmission mode is enabled automatically
by the driver once Launchtime (or CBS, as before) is enabled.
Similarly, it's automatically disabled when the feature is disabled
for the last queue that had it setup on.

The driver just consumes the transmit times from the skbuffs directly,
so no special handling is done in case an 'invalid' time is provided.
We assume this has been handled by the ETF qdisc already.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 .../net/ethernet/intel/igb/e1000_defines.h    |  16 ++
 drivers/net/ethernet/intel/igb/igb.h          |   1 +
 drivers/net/ethernet/intel/igb/igb_main.c     | 139 +++++++++++++++---
 3 files changed, 139 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 252440a418dc..8a28f3388f69 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -1048,6 +1048,22 @@
 #define E1000_TQAVCTRL_XMIT_MODE	BIT(0)
 #define E1000_TQAVCTRL_DATAFETCHARB	BIT(4)
 #define E1000_TQAVCTRL_DATATRANARB	BIT(8)
+#define E1000_TQAVCTRL_DATATRANTIM	BIT(9)
+#define E1000_TQAVCTRL_SP_WAIT_SR	BIT(10)
+/* Fetch Time Delta - bits 31:16
+ *
+ * This field holds the value to be reduced from the launch time for
+ * fetch time decision. The FetchTimeDelta value is defined in 32 ns
+ * granularity.
+ *
+ * This field is 16 bits wide, and so the maximum value is:
+ *
+ * 65535 * 32 = 2097120 ~= 2.1 msec
+ *
+ * XXX: We are configuring the max value here since we couldn't come up
+ * with a reason for not doing so.
+ */
+#define E1000_TQAVCTRL_FETCHTIME_DELTA	(0xFFFF << 16)
 
 /* TX Qav Credit Control fields */
 #define E1000_TQAVCC_IDLESLOPE_MASK	0xFFFF
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 9643b5b3d444..ca54e268d157 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -262,6 +262,7 @@ struct igb_ring {
 	u16 count;			/* number of desc. in the ring */
 	u8 queue_index;			/* logical index of the ring*/
 	u8 reg_idx;			/* physical index of the ring */
+	bool launchtime_enable;		/* true if LaunchTime is enabled */
 	bool cbs_enable;		/* indicates if CBS is enabled */
 	s32 idleslope;			/* idleSlope in kbps */
 	s32 sendslope;			/* sendSlope in kbps */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c30ab7b260cc..9b9a6a6227e0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1666,13 +1666,26 @@ static bool is_any_cbs_enabled(struct igb_adapter *adapter)
 	return false;
 }
 
+static bool is_any_txtime_enabled(struct igb_adapter *adapter)
+{
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		if (adapter->tx_ring[i]->launchtime_enable)
+			return true;
+	}
+
+	return false;
+}
+
 /**
  *  igb_config_tx_modes - Configure "Qav Tx mode" features on igb
  *  @adapter: pointer to adapter struct
  *  @queue: queue number
  *
- *  Configure CBS for a given hardware queue. Parameters are retrieved
- *  from the correct Tx ring, so igb_save_cbs_params() should be used
+ *  Configure CBS and Launchtime for a given hardware queue.
+ *  Parameters are retrieved from the correct Tx ring, so
+ *  igb_save_cbs_params() and igb_save_txtime_params() should be used
  *  for setting those correctly prior to this function being called.
  **/
 static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
@@ -1686,6 +1699,19 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 	WARN_ON(hw->mac.type != e1000_i210);
 	WARN_ON(queue < 0 || queue > 1);
 
+	/* If any of the Qav features is enabled, configure queues as SR and
+	 * with HIGH PRIO. If none is, then configure them with LOW PRIO and
+	 * as SP.
+	 */
+	if (ring->cbs_enable || ring->launchtime_enable) {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
+		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
+	} else {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
+		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
+	}
+
+	/* If CBS is enabled, set DataTranARB and config its parameters. */
 	if (ring->cbs_enable || queue == 0) {
 		/* i210 does not allow the queue 0 to be in the Strict
 		 * Priority mode while the Qav mode is enabled, so,
@@ -1702,9 +1728,6 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 			ring->hicredit = ETH_FRAME_LEN;
 		}
 
-		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
-		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
-
 		/* Always set data transfer arbitration to credit-based
 		 * shaper algorithm on TQAVCTRL if CBS is enabled for any of
 		 * the queues.
@@ -1780,8 +1803,6 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 		wr32(E1000_I210_TQAVHC(queue),
 		     0x80000000 + ring->hicredit * 0x7735);
 	} else {
-		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
-		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
 
 		/* Set idleSlope to zero. */
 		tqavcc = rd32(E1000_I210_TQAVCC(queue));
@@ -1802,17 +1823,61 @@ static void igb_config_tx_modes(struct igb_adapter *adapter, int queue)
 		}
 	}
 
+	/* If LaunchTime is enabled, set DataTranTIM. */
+	if (ring->launchtime_enable) {
+		/* Always set DataTranTIM on TQAVCTRL if LaunchTime is enabled
+		 * for any of the SR queues, and configure fetchtime delta.
+		 * XXX NOTE:
+		 *     - LaunchTime will be enabled for all SR queues.
+		 *     - A fixed offset can be added relative to the launch
+		 *       time of all packets if configured at reg LAUNCH_OS0.
+		 *       We are keeping it as 0 for now (default value).
+		 */
+		tqavctrl = rd32(E1000_I210_TQAVCTRL);
+		tqavctrl |= E1000_TQAVCTRL_DATATRANTIM |
+		       E1000_TQAVCTRL_FETCHTIME_DELTA;
+		wr32(E1000_I210_TQAVCTRL, tqavctrl);
+	} else {
+		/* If Launchtime is not enabled for any SR queues anymore,
+		 * then clear DataTranTIM on TQAVCTRL and clear fetchtime delta,
+		 * effectively disabling Launchtime.
+		 */
+		if (!is_any_txtime_enabled(adapter)) {
+			tqavctrl = rd32(E1000_I210_TQAVCTRL);
+			tqavctrl &= ~E1000_TQAVCTRL_DATATRANTIM;
+			tqavctrl &= ~E1000_TQAVCTRL_FETCHTIME_DELTA;
+			wr32(E1000_I210_TQAVCTRL, tqavctrl);
+		}
+	}
+
 	/* XXX: In i210 controller the sendSlope and loCredit parameters from
 	 * CBS are not configurable by software so we don't do any 'controller
 	 * configuration' in respect to these parameters.
 	 */
 
-	netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n",
-		   (ring->cbs_enable) ? "enabled" : "disabled", queue,
+	netdev_dbg(netdev, "Qav Tx mode: cbs %s, launchtime %s, queue %d \
+			    idleslope %d sendslope %d hiCredit %d \
+			    locredit %d\n",
+		   (ring->cbs_enable) ? "enabled" : "disabled",
+		   (ring->launchtime_enable) ? "enabled" : "disabled", queue,
 		   ring->idleslope, ring->sendslope, ring->hicredit,
 		   ring->locredit);
 }
 
+static int igb_save_txtime_params(struct igb_adapter *adapter, int queue,
+				  bool enable)
+{
+	struct igb_ring *ring;
+
+	if (queue < 0 || queue > adapter->num_tx_queues)
+		return -EINVAL;
+
+	ring = adapter->tx_ring[queue];
+	ring->launchtime_enable = enable;
+
+	return 0;
+}
+
 static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
 			       bool enable, int idleslope, int sendslope,
 			       int hicredit, int locredit)
@@ -1856,10 +1921,11 @@ static void igb_setup_tx_mode(struct igb_adapter *adapter)
 		int i, max_queue;
 
 		/* Configure TQAVCTRL register: set transmit mode to 'Qav',
-		 * set data fetch arbitration to 'round robin'.
+		 * set data fetch arbitration to 'round robin', set SP_WAIT_SR
+		 * so SP queues wait for SR ones.
 		 */
 		val = rd32(E1000_I210_TQAVCTRL);
-		val |= E1000_TQAVCTRL_XMIT_MODE;
+		val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_SP_WAIT_SR;
 		val &= ~E1000_TQAVCTRL_DATAFETCHARB;
 		wr32(E1000_I210_TQAVCTRL, val);
 
@@ -2483,7 +2549,7 @@ static void igb_offload_apply(struct igb_adapter *adapter, s32 queue)
 
 	igb_config_tx_modes(adapter, queue);
 
-	if (!is_any_cbs_enabled(adapter))
+	if (!is_any_cbs_enabled(adapter) && !is_any_txtime_enabled(adapter))
 		enable_fqtss(adapter, false);
 }
 
@@ -2756,6 +2822,30 @@ static int igb_setup_tc_block(struct igb_adapter *adapter,
 	}
 }
 
+static int igb_offload_txtime(struct igb_adapter *adapter,
+			      struct tc_etf_qopt_offload *qopt)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	int err;
+
+	/* Launchtime offloading is only supported by i210 controller. */
+	if (hw->mac.type != e1000_i210)
+		return -EOPNOTSUPP;
+
+	/* Launchtime offloading is only supported by queues 0 and 1. */
+	if (qopt->queue < 0 || qopt->queue > 1)
+		return -EINVAL;
+
+	err = igb_save_txtime_params(adapter, qopt->queue, qopt->enable);
+
+	if (err)
+		return err;
+
+	igb_offload_apply(adapter, qopt->queue);
+
+	return 0;
+}
+
 static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			void *type_data)
 {
@@ -2766,6 +2856,8 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
 		return igb_offload_cbs(adapter, type_data);
 	case TC_SETUP_BLOCK:
 		return igb_setup_tc_block(adapter, type_data);
+	case TC_SETUP_QDISC_ETF:
+		return igb_offload_txtime(adapter, type_data);
 
 	default:
 		return -EOPNOTSUPP;
@@ -5586,11 +5678,14 @@ static void igb_set_itr(struct igb_q_vector *q_vector)
 	}
 }
 
-static void igb_tx_ctxtdesc(struct igb_ring *tx_ring, u32 vlan_macip_lens,
-			    u32 type_tucmd, u32 mss_l4len_idx)
+static void igb_tx_ctxtdesc(struct igb_ring *tx_ring,
+			    struct igb_tx_buffer *first,
+			    u32 vlan_macip_lens, u32 type_tucmd,
+			    u32 mss_l4len_idx)
 {
 	struct e1000_adv_tx_context_desc *context_desc;
 	u16 i = tx_ring->next_to_use;
+	struct timespec64 ts;
 
 	context_desc = IGB_TX_CTXTDESC(tx_ring, i);
 
@@ -5605,9 +5700,18 @@ static void igb_tx_ctxtdesc(struct igb_ring *tx_ring, u32 vlan_macip_lens,
 		mss_l4len_idx |= tx_ring->reg_idx << 4;
 
 	context_desc->vlan_macip_lens	= cpu_to_le32(vlan_macip_lens);
-	context_desc->seqnum_seed	= 0;
 	context_desc->type_tucmd_mlhl	= cpu_to_le32(type_tucmd);
 	context_desc->mss_l4len_idx	= cpu_to_le32(mss_l4len_idx);
+
+	/* We assume there is always a valid tx time available. Invalid times
+	 * should have been handled by the upper layers.
+	 */
+	if (tx_ring->launchtime_enable) {
+		ts = ns_to_timespec64(first->skb->tstamp);
+		context_desc->seqnum_seed = cpu_to_le32(ts.tv_nsec / 32);
+	} else {
+		context_desc->seqnum_seed = 0;
+	}
 }
 
 static int igb_tso(struct igb_ring *tx_ring,
@@ -5690,7 +5794,8 @@ static int igb_tso(struct igb_ring *tx_ring,
 	vlan_macip_lens |= (ip.hdr - skb->data) << E1000_ADVTXD_MACLEN_SHIFT;
 	vlan_macip_lens |= first->tx_flags & IGB_TX_FLAGS_VLAN_MASK;
 
-	igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, mss_l4len_idx);
+	igb_tx_ctxtdesc(tx_ring, first, vlan_macip_lens,
+			type_tucmd, mss_l4len_idx);
 
 	return 1;
 }
@@ -5745,7 +5850,7 @@ static void igb_tx_csum(struct igb_ring *tx_ring, struct igb_tx_buffer *first)
 	vlan_macip_lens |= skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT;
 	vlan_macip_lens |= first->tx_flags & IGB_TX_FLAGS_VLAN_MASK;
 
-	igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, 0);
+	igb_tx_ctxtdesc(tx_ring, first, vlan_macip_lens, type_tucmd, 0);
 }
 
 #define IGB_SET_FLAG(_input, _flag, _result) \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (10 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 11/14] igb: Add support for ETF offload Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-27 23:56   ` Eric Dumazet
  2018-06-27 21:59 ` [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf Jesus Sanchez-Palencia
  2018-06-27 21:59 ` [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue Jesus Sanchez-Palencia
  13 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Currently, skb_tx_timestamp() is being called before the DMA
descriptors are prepared in igb_xmit_frame_ring(), which happens
during either the igb_tso() or igb_tx_csum() calls.

Given that now the skb->tstamp might be used to carry the timestamp
for SO_TXTIME, we must only call skb_tx_timestamp() after the
information has been copied into the DMA tx_ring.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9b9a6a6227e0..0d72f2417143 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6138,8 +6138,6 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 		}
 	}
 
-	skb_tx_timestamp(skb);
-
 	if (skb_vlan_tag_present(skb)) {
 		tx_flags |= IGB_TX_FLAGS_VLAN;
 		tx_flags |= (skb_vlan_tag_get(skb) << IGB_TX_FLAGS_VLAN_SHIFT);
@@ -6155,6 +6153,8 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	else if (!tso)
 		igb_tx_csum(tx_ring, first);
 
+	skb_tx_timestamp(skb);
+
 	if (igb_tx_map(tx_ring, first, hdr_len))
 		goto cleanup_tx_tstamp;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (11 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-28 14:26   ` Willem de Bruijn
  2018-06-27 21:59 ` [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue Jesus Sanchez-Palencia
  13 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

The qdisc and the SO_TXTIME ABIs allow for a clockid to be configured,
but it's been decided that usage of CLOCK_TAI should be enforced until
we decide to allow for other clockids to be used. The rationale here is
that PTP times are usually in the TAI scale, thus no other clocks should
be necessary.

For now, the qdisc will return EINVAL if any clocks other than
CLOCK_TAI are used.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 net/sched/sch_etf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index cd6cb5b69228..5514a8aa3bd5 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -56,8 +56,8 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
 		return -ENOTSUPP;
 	}
 
-	if (qopt->clockid >= MAX_CLOCKS) {
-		NL_SET_ERR_MSG(extack, "Invalid clockid");
+	if (qopt->clockid != CLOCK_TAI) {
+		NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");
 		return -EINVAL;
 	}
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
  2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
                   ` (12 preceding siblings ...)
  2018-06-27 21:59 ` [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf Jesus Sanchez-Palencia
@ 2018-06-27 21:59 ` Jesus Sanchez-Palencia
  2018-06-28 14:27   ` Willem de Bruijn
  13 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 21:59 UTC (permalink / raw)
  To: netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri,
	Jesus Sanchez-Palencia

Use the socket error queue for reporting dropped packets if the
socket has enabled that feature through the SO_TXTIME API.

Packets are dropped either on enqueue() if they aren't accepted by the
qdisc or on dequeue() if the system misses their deadline. Those are
reported as different errors so applications can react accordingly.

Userspace can retrieve the errors through the socket error queue and the
corresponding cmsg interfaces. A struct sock_extended_err* is used for
returning the error data, and the packet's timestamp can be retrieved by
adding both ee_data and ee_info fields as e.g.:

    ((__u64) serr->ee_data << 32) + serr->ee_info

This feature is disabled by default and must be explicitly enabled by
applications. Enabling it can bring some overhead for the Tx cycles
of the application.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 include/linux/socket.h        |  4 +++-
 include/net/sock.h            |  1 +
 include/uapi/linux/errqueue.h |  2 ++
 net/sched/sch_etf.c           | 37 +++++++++++++++++++++++++++++++++--
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index ca476b7a8ff0..75e11d29b32a 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -85,7 +85,9 @@ struct cmsghdr {
 
 struct sock_txtime {
 	clockid_t	clockid;	/* reference clockid */
-	u16		flags;		/* bit 0: txtime in deadline_mode */
+	u16		flags;		/* bit 0: txtime in deadline_mode
+					 * bit 1: report drops on sk err queue
+					 */
 };
 
 /*
diff --git a/include/net/sock.h b/include/net/sock.h
index 73f4404e49e4..e681a45cfe7e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -473,6 +473,7 @@ struct sock {
 	u16			sk_clockid;
 	u16			sk_txtime_flags;
 #define SK_TXTIME_DEADLINE_MASK	BIT(0)
+#define SK_TXTIME_RECV_ERR_MASK	BIT(1)
 
 	struct socket		*sk_socket;
 	void			*sk_user_data;
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index dc64cfaf13da..66fd5e443c94 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -25,6 +25,8 @@ struct sock_extended_err {
 #define SO_EE_OFFENDER(ee)	((struct sockaddr*)((ee)+1))
 
 #define SO_EE_CODE_ZEROCOPY_COPIED	1
+#define SO_EE_CODE_TXTIME_INVALID_PARAM	2
+#define SO_EE_CODE_TXTIME_MISSED	3
 
 /**
  *	struct scm_timestamping - timestamps exposed through cmsg
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 5514a8aa3bd5..166f4b72875b 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -11,6 +11,7 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/errno.h>
+#include <linux/errqueue.h>
 #include <linux/rbtree.h>
 #include <linux/skbuff.h>
 #include <linux/posix-timers.h>
@@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch)
 	qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
 }
 
+static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
+{
+	struct sock_exterr_skb *serr;
+	ktime_t txtime = skb->tstamp;
+
+	if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
+		return;
+
+	skb = skb_clone_sk(skb);
+	if (!skb)
+		return;
+
+	sock_hold(skb->sk);
+
+	serr = SKB_EXT_ERR(skb);
+	serr->ee.ee_errno = err;
+	serr->ee.ee_origin = SO_EE_ORIGIN_LOCAL;
+	serr->ee.ee_type = 0;
+	serr->ee.ee_code = code;
+	serr->ee.ee_pad = 0;
+	serr->ee.ee_data = (txtime >> 32); /* high part of tstamp */
+	serr->ee.ee_info = txtime; /* low part of tstamp */
+
+	if (sock_queue_err_skb(skb->sk, skb))
+		kfree_skb(skb);
+
+	sock_put(skb->sk);
+}
+
 static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 				      struct sk_buff **to_free)
 {
@@ -131,8 +161,10 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
 	struct rb_node **p = &q->head.rb_node, *parent = NULL;
 	ktime_t txtime = nskb->tstamp;
 
-	if (!is_packet_valid(sch, nskb))
+	if (!is_packet_valid(sch, nskb)) {
+		report_sock_error(nskb, EINVAL, SO_EE_CODE_TXTIME_INVALID_PARAM);
 		return qdisc_drop(nskb, sch, to_free);
+	}
 
 	while (*p) {
 		struct sk_buff *skb;
@@ -175,6 +207,8 @@ static void timesortedlist_erase(struct Qdisc *sch, struct sk_buff *skb,
 	if (drop) {
 		struct sk_buff *to_free = NULL;
 
+		report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
+
 		qdisc_drop(skb, sch, &to_free);
 		kfree_skb_list(to_free);
 		qdisc_qstats_overlimit(sch);
@@ -200,7 +234,6 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
 	now = q->get_time();
 
 	/* Drop if packet has expired while in queue. */
-	/* FIXME: Must return error on the socket's error queue */
 	if (ktime_before(skb->tstamp, now)) {
 		timesortedlist_erase(sch, skb, true);
 		skb = NULL;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
@ 2018-06-27 22:16   ` Eric Dumazet
  2018-06-27 23:07     ` Jesus Sanchez-Palencia
  2018-06-28  2:16   ` kbuild test robot
  2018-06-28 14:26   ` Willem de Bruijn
  2 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2018-06-27 22:16 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia, netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri, Richard Cochran



On 06/27/2018 02:59 PM, Jesus Sanchez-Palencia wrote:
> From: Richard Cochran <rcochran@linutronix.de>
> 
> This patch introduces SO_TXTIME. User space enables this option in
> order to pass a desired future transmit time in a CMSG when calling
> sendmsg(2). The argument to this socket option is a 6-bytes long struct
> defined as:
> 
> struct sock_txtime {
> 	clockid_t 	clockid;
> 	u16		flags;
> };

Note that sizeof(struct sock_txtime) is 8, not 6, because of alignments.

This means that your implementation of getsockopt(... SO_TXTIME )
is probably leaking two bytes of kernel stack to user space.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 22:16   ` Eric Dumazet
@ 2018-06-27 23:07     ` Jesus Sanchez-Palencia
  2018-06-28  0:05       ` Eric Dumazet
  0 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-27 23:07 UTC (permalink / raw)
  To: Eric Dumazet, netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, ilias.apalodimas, ivan.khoronzhuk, mlichvar,
	willemb, jhs, xiyou.wangcong, jiri

Hi Eric,


On 06/27/2018 03:16 PM, Eric Dumazet wrote:
> 
> 
> On 06/27/2018 02:59 PM, Jesus Sanchez-Palencia wrote:
>> From: Richard Cochran <rcochran@linutronix.de>
>>
>> This patch introduces SO_TXTIME. User space enables this option in
>> order to pass a desired future transmit time in a CMSG when calling
>> sendmsg(2). The argument to this socket option is a 6-bytes long struct
>> defined as:
>>
>> struct sock_txtime {
>> 	clockid_t 	clockid;
>> 	u16		flags;
>> };
> 
> Note that sizeof(struct sock_txtime) is 8, not 6, because of alignments.


Oh yeah, sure.


> 
> This means that your implementation of getsockopt(... SO_TXTIME )
> is probably leaking two bytes of kernel stack to user space.

I'm failing to see how... There is a memset() in sock.c:1147 clearing all the 8
bytes that we later use to (explicitly) assign each member of the struct. Aren't
the 2 extra bytes sanitized, then? What have I missed?


Thanks,
Jesus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready
  2018-06-27 21:59 ` [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready Jesus Sanchez-Palencia
@ 2018-06-27 23:56   ` Eric Dumazet
  2018-06-28 17:12     ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2018-06-27 23:56 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia, netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri



On 06/27/2018 02:59 PM, Jesus Sanchez-Palencia wrote:
> Currently, skb_tx_timestamp() is being called before the DMA
> descriptors are prepared in igb_xmit_frame_ring(), which happens
> during either the igb_tso() or igb_tx_csum() calls.
> 
> Given that now the skb->tstamp might be used to carry the timestamp
> for SO_TXTIME, we must only call skb_tx_timestamp() after the
> information has been copied into the DMA tx_ring.


Since when this skb->tstamp use happened ?

If this is in patch 11/14 (igb: Add support for ETF offload), then you should either :

1) Squash this into 11/14

2) swap 11 and 12 patch, so that this change is done before "igb: Add support for ETF offload"  

Otherwise a bisection could fail badly.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 23:07     ` Jesus Sanchez-Palencia
@ 2018-06-28  0:05       ` Eric Dumazet
  0 siblings, 0 replies; 31+ messages in thread
From: Eric Dumazet @ 2018-06-28  0:05 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia, Eric Dumazet, netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, ilias.apalodimas, ivan.khoronzhuk, mlichvar,
	willemb, jhs, xiyou.wangcong, jiri



On 06/27/2018 04:07 PM, Jesus Sanchez-Palencia wrote:
 
> I'm failing to see how... There is a memset() in sock.c:1147 clearing all the 8
> bytes that we later use to (explicitly) assign each member of the struct. Aren't
> the 2 extra bytes sanitized, then? What have I missed?

Nothing, it seems I missed the memset(), it was not seen in the context of your patch
and I have not checked the whole function.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
  2018-06-27 22:16   ` Eric Dumazet
@ 2018-06-28  2:16   ` kbuild test robot
  2018-06-28 14:26   ` Willem de Bruijn
  2 siblings, 0 replies; 31+ messages in thread
From: kbuild test robot @ 2018-06-28  2:16 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: kbuild-all, netdev, tglx, jan.altenberg, vinicius.gomes,
	kurt.kanzenbach, henrik, richardcochran, levi.pearson,
	ilias.apalodimas, ivan.khoronzhuk, mlichvar, willemb, jhs,
	xiyou.wangcong, jiri, Richard Cochran, Jesus Sanchez-Palencia

[-- Attachment #1: Type: text/plain, Size: 20657 bytes --]

Hi Richard,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jesus-Sanchez-Palencia/Scheduled-packet-Transmission-ETF/20180628-061119
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info'
   include/net/mac80211.h:955: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
   net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info'
   kernel/sched/fair.c:3760: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg'
   include/linux/device.h:93: warning: bad line: this bus.
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf'
   include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array'
   include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
   include/linux/iio/hw-consumer.h:1: warning: no structured comments found
   include/linux/device.h:94: warning: bad line: this bus.
   include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry'
   include/linux/regulator/driver.h:227: warning: Function parameter or member 'resume_early' not described in 'regulator_ops'
   drivers/regulator/core.c:4465: warning: Excess function parameter 'state' description in 'regulator_suspend_late'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb'
   drivers/usb/dwc3/gadget.c:510: warning: Excess function parameter 'dwc' description in 'dwc3_gadget_start_config'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
   include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
   drivers/gpu/drm/i915/i915_vma.h:48: warning: cannot understand function prototype: 'struct i915_vma '
   drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found
   include/drm/tinydrm/tinydrm.h:34: warning: Function parameter or member 'fb_dirty' not described in 'tinydrm_device'
   drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'crtc_state' not described in 'mipi_dbi_enable_flush'
   drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'plane_state' not described in 'mipi_dbi_enable_flush'
   include/linux/skbuff.h:853: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'list' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'skb_mstamp' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member '__unused' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'pfmemalloc' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'offload_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'offload_mr_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:853: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:234: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:499: warning: Function parameter or member 'sk_backlog.rmem_alloc' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_backlog.len' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_backlog.head' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_backlog.tail' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
>> include/net/sock.h:499: warning: Function parameter or member 'sk_clockid' not described in 'sock'
>> include/net/sock.h:499: warning: Function parameter or member 'sk_txtime_flags' not described in 'sock'
   include/net/sock.h:499: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
   net/core/datagram.c:835: warning: Function parameter or member 'events' not described in 'datagram_poll_mask'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'adj_list.upper' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'adj_list.lower' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'switchdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:1998: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   sound/soc/soc-core.c:2787: warning: Excess function parameter 'legacy_dai_naming' description in 'snd_soc_register_dais'
   include/linux/rcupdate.h:571: ERROR: Unexpected indentation.
   include/linux/rcupdate.h:575: ERROR: Unexpected indentation.
   include/linux/rcupdate.h:579: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/rcupdate.h:581: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/rcupdate.h:581: WARNING: Inline literal start-string without end-string.
   lib/reed_solomon/reed_solomon.c:287: ERROR: Unknown target name: "gfp".
   include/linux/wait.h:110: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:113: ERROR: Unexpected indentation.
   include/linux/wait.h:115: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:1129: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:327: WARNING: Inline literal start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/ata/libata-core.c:5943: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5054: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1892: WARNING: Definition list ends without a blank line; unexpected unindent.
   include/linux/mtd/rawnand.h:1446: WARNING: Inline strong start-string without end-string.
   include/linux/mtd/rawnand.h:1448: WARNING: Inline strong start-string without end-string.
   include/linux/regulator/driver.h:279: ERROR: Unknown target name: "regulator_regmap_x_voltage".
   Documentation/driver-api/soundwire/locking.rst:50: ERROR: Inconsistent literal block quoting.
   Documentation/driver-api/soundwire/locking.rst:51: WARNING: Line block ends without a blank line.
   Documentation/driver-api/soundwire/locking.rst:55: WARNING: Inline substitution_reference start-string without end-string.
   Documentation/driver-api/soundwire/locking.rst:56: WARNING: Line block ends without a blank line.
   Documentation/driver-api/soundwire/stream.rst:177: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:203: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:248: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:277: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:304: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:328: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:352: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/driver-api/soundwire/stream.rst:364: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   include/linux/spi/spi.h:373: ERROR: Unexpected indentation.
   Documentation/gpu/drivers.rst:5: WARNING: toctree contains reference to nonexisting document u'gpu/v3d'
   Documentation/misc-devices/ibmvmc.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent.
   Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found.
   Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found.
   Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found.
   Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found.
   Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string.
   Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
   Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent.
   Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent.
   net/core/dev.c:4650: ERROR: Unknown target name: "page_is".
   Documentation/networking/net_failover.rst:48: WARNING: Definition list ends without a blank line; unexpected unindent.
   Documentation/networking/net_failover.rst:50: ERROR: Unexpected indentation.
   Documentation/networking/net_failover.rst:52: ERROR: Unexpected indentation.
   Documentation/networking/net_failover.rst:53: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/networking/net_failover.rst:55: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/networking/net_failover.rst:63: ERROR: Unexpected indentation.
   Documentation/networking/net_failover.rst:64: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/networking/net_failover.rst:86: ERROR: Unexpected indentation.
   Documentation/networking/net_failover.rst:88: ERROR: Unexpected indentation.
   Documentation/networking/net_failover.rst:89: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/networking/net_failover.rst:91: WARNING: Block quote ends without a blank line; unexpected unindent.
   Documentation/process/2.Process.rst:131: ERROR: Malformed table.
   Bottom/header table border does not match top border.

vim +499 include/net/sock.h

^1da177e Linus Torvalds 2005-04-16 @499  

:::::: The code at line 499 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6443 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
  2018-06-27 22:16   ` Eric Dumazet
  2018-06-28  2:16   ` kbuild test robot
@ 2018-06-28 14:26   ` Willem de Bruijn
  2018-06-28 14:40     ` Willem de Bruijn
  2 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-28 14:26 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Richard Cochran

On Wed, Jun 27, 2018 at 6:08 PM Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
>
> From: Richard Cochran <rcochran@linutronix.de>
>
> This patch introduces SO_TXTIME. User space enables this option in
> order to pass a desired future transmit time in a CMSG when calling
> sendmsg(2). The argument to this socket option is a 6-bytes long struct
> defined as:
>
> struct sock_txtime {
>         clockid_t       clockid;
>         u16             flags;
> };

clockid_t is __kernel_clockid_t is int is a variable length field.
Please use fixed
length fields. Also, as MAX_CLOCKS is 16, only 4 bits are needed. A single u16
is probably sufficient as cmsg argument. To future proof, a u32 will
allow for more
than 4 flags. But in struct sock, 16 bits should be sufficient to
encode both clock id
and flags.

> Note that two new fields were added to struct sock by filling a 4-bytes
> hole found in the struct. For that reason, neither the struct size or
> number of cachelines were altered.
>
> Signed-off-by: Richard Cochran <rcochran@linutronix.de>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> ---

> +#include <asm/unaligned.h>
>  #include <linux/capability.h>
>  #include <linux/errno.h>
>  #include <linux/errqueue.h>
> @@ -697,6 +698,7 @@ EXPORT_SYMBOL(sk_mc_loop);
>  int sock_setsockopt(struct socket *sock, int level, int optname,
>                     char __user *optval, unsigned int optlen)
>  {
> +       struct sock_txtime sk_txtime;
>         struct sock *sk = sock->sk;
>         int val;
>         int valbool;
> @@ -1070,6 +1072,22 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>                 }
>                 break;
>
> +       case SO_TXTIME:
> +               if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
> +                       ret = -EPERM;
> +               } else if (optlen != sizeof(struct sock_txtime)) {
> +                       ret = -EINVAL;
> +               } else if (copy_from_user(&sk_txtime, optval,
> +                          sizeof(struct sock_txtime))) {
> +                       ret = -EFAULT;
> +                       sock_valbool_flag(sk, SOCK_TXTIME, false);

Why change sk state on failure? This is not customary.

> +               } else {
> +                       sock_valbool_flag(sk, SOCK_TXTIME, true);
> +                       sk->sk_clockid = sk_txtime.clockid;
> +                       sk->sk_txtime_flags = sk_txtime.flags;

Validate input and fail on undefined flags.

> @@ -2137,6 +2162,13 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>                 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>                 sockc->tsflags |= tsflags;
>                 break;
> +       case SCM_TXTIME:
> +               if (!sock_flag(sk, SOCK_TXTIME))
> +                       return -EINVAL;

Note that on lockfree datapaths like udp this test can race with the
setsockopt above.
It seems harmless here.

> +               if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
> +                       return -EINVAL;
> +               sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
> +               break;
>         /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
>         case SCM_RIGHTS:
>         case SCM_CREDENTIALS:
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission
  2018-06-27 21:59 ` [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission Jesus Sanchez-Palencia
@ 2018-06-28 14:26   ` Willem de Bruijn
  0 siblings, 0 replies; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-28 14:26 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Richard Cochran

On Wed, Jun 27, 2018 at 6:07 PM Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
>
> Add a transmit_time field to struct inet_cork, then copy the
> timestamp from the CMSG cookie at ip_setup_cork() so we can
> safely copy it into the skb later during __ip_make_skb().
>
> For the raw fast path, just perform the copy at raw_send_hdrinc().
>
> Signed-off-by: Richard Cochran <rcochran@linutronix.de>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> ---
>  include/net/inet_sock.h | 1 +
>  net/ipv4/ip_output.c    | 3 +++
>  net/ipv4/raw.c          | 2 ++
>  net/ipv4/udp.c          | 1 +

Also support the feature for ipv6

>  4 files changed, 7 insertions(+)
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 83d5b3c2ac42..314be484c696 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -148,6 +148,7 @@ struct inet_cork {
>         __s16                   tos;
>         char                    priority;
>         __u16                   gso_size;
> +       u64                     transmit_time;
>  };
>
>  struct inet_cork_full {
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b3308e9d9762..904a54a090e9 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1153,6 +1153,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
>         cork->tos = ipc->tos;
>         cork->priority = ipc->priority;
>         cork->tx_flags = ipc->tx_flags;
> +       cork->transmit_time = ipc->sockc.transmit_time;

Initialize ipc->sockc.transmit_time in all possible paths to avoid bugs like the
one fixed in commit 9887cba19978 ("ip: limit use of gso_size to udp").

>         return 0;
>  }
> @@ -1413,6 +1414,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>
>         skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
>         skb->mark = sk->sk_mark;
> +       skb->tstamp = cork->transmit_time;
>         /*
>          * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
>          * on dst refcount
> @@ -1495,6 +1497,7 @@ struct sk_buff *ip_make_skb(struct sock *sk,
>         cork->flags = 0;
>         cork->addr = 0;
>         cork->opt = NULL;
> +       cork->transmit_time = 0;

Not needed when unconditionally overwriting the field in ip_setup_cork.

>         err = ip_setup_cork(sk, cork, ipc, rtp);
>         if (err)
>                 return ERR_PTR(err);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf
  2018-06-27 21:59 ` [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf Jesus Sanchez-Palencia
@ 2018-06-28 14:26   ` Willem de Bruijn
  2018-06-28 17:11     ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-28 14:26 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko

On Wed, Jun 27, 2018 at 8:45 PM Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
>
> The qdisc and the SO_TXTIME ABIs allow for a clockid to be configured,
> but it's been decided that usage of CLOCK_TAI should be enforced until
> we decide to allow for other clockids to be used. The rationale here is
> that PTP times are usually in the TAI scale, thus no other clocks should
> be necessary.
>
> For now, the qdisc will return EINVAL if any clocks other than
> CLOCK_TAI are used.
>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> ---
>  net/sched/sch_etf.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> index cd6cb5b69228..5514a8aa3bd5 100644
> --- a/net/sched/sch_etf.c
> +++ b/net/sched/sch_etf.c
> @@ -56,8 +56,8 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
>                 return -ENOTSUPP;
>         }
>
> -       if (qopt->clockid >= MAX_CLOCKS) {
> -               NL_SET_ERR_MSG(extack, "Invalid clockid");
> +       if (qopt->clockid != CLOCK_TAI) {
> +               NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");

Similar to the comment in patch 12, this should be squashed (into
patch 6) to avoid incorrect behavior in a range of SHA1s.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
  2018-06-27 21:59 ` [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue Jesus Sanchez-Palencia
@ 2018-06-28 14:27   ` Willem de Bruijn
  2018-06-29 17:48     ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-28 14:27 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko

On Wed, Jun 27, 2018 at 6:07 PM Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
>
> Use the socket error queue for reporting dropped packets if the
> socket has enabled that feature through the SO_TXTIME API.
>
> Packets are dropped either on enqueue() if they aren't accepted by the
> qdisc or on dequeue() if the system misses their deadline. Those are
> reported as different errors so applications can react accordingly.
>
> Userspace can retrieve the errors through the socket error queue and the
> corresponding cmsg interfaces. A struct sock_extended_err* is used for
> returning the error data, and the packet's timestamp can be retrieved by
> adding both ee_data and ee_info fields as e.g.:
>
>     ((__u64) serr->ee_data << 32) + serr->ee_info
>
> This feature is disabled by default and must be explicitly enabled by
> applications. Enabling it can bring some overhead for the Tx cycles
> of the application.
>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> ---

>  struct sock_txtime {
>         clockid_t       clockid;        /* reference clockid */
> -       u16             flags;          /* bit 0: txtime in deadline_mode */
> +       u16             flags;          /* bit 0: txtime in deadline_mode
> +                                        * bit 1: report drops on sk err queue
> +                                        */
>  };

If this is shared with userspace, should be defined in an uapi header.
Same on the flag bits below. Self documenting code is preferable over
comments.

>  /*
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 73f4404e49e4..e681a45cfe7e 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -473,6 +473,7 @@ struct sock {
>         u16                     sk_clockid;
>         u16                     sk_txtime_flags;
>  #define SK_TXTIME_DEADLINE_MASK        BIT(0)
> +#define SK_TXTIME_RECV_ERR_MASK        BIT(1)

Integer bitfields are (arguably) more readable. There is no requirement
that the user interface be the same as the in-kernel implementation. Indeed
if you can save bits in struct sock, that is preferable (but not so for the ABI,
which cannot easily be extended).

>
>         struct socket           *sk_socket;
>         void                    *sk_user_data;
> diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
> index dc64cfaf13da..66fd5e443c94 100644
> --- a/include/uapi/linux/errqueue.h
> +++ b/include/uapi/linux/errqueue.h
> @@ -25,6 +25,8 @@ struct sock_extended_err {
>  #define SO_EE_OFFENDER(ee)     ((struct sockaddr*)((ee)+1))
>
>  #define SO_EE_CODE_ZEROCOPY_COPIED     1
> +#define SO_EE_CODE_TXTIME_INVALID_PARAM        2
> +#define SO_EE_CODE_TXTIME_MISSED       3
>
>  /**
>   *     struct scm_timestamping - timestamps exposed through cmsg
> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> index 5514a8aa3bd5..166f4b72875b 100644
> --- a/net/sched/sch_etf.c
> +++ b/net/sched/sch_etf.c
> @@ -11,6 +11,7 @@
>  #include <linux/kernel.h>
>  #include <linux/string.h>
>  #include <linux/errno.h>
> +#include <linux/errqueue.h>
>  #include <linux/rbtree.h>
>  #include <linux/skbuff.h>
>  #include <linux/posix-timers.h>
> @@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch)
>         qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
>  }
>
> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
> +{
> +       struct sock_exterr_skb *serr;
> +       ktime_t txtime = skb->tstamp;
> +
> +       if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
> +               return;
> +
> +       skb = skb_clone_sk(skb);
> +       if (!skb)
> +               return;
> +
> +       sock_hold(skb->sk);

Why take an extra reference? The skb holds a ref on the sk.

> +
> +       serr = SKB_EXT_ERR(skb);
> +       serr->ee.ee_errno = err;
> +       serr->ee.ee_origin = SO_EE_ORIGIN_LOCAL;

I suggest adding a new SO_EE_ORIGIN_TXTIME as opposed to overloading
the existing
local origin. Then the EE_CODE can start at 1, as ee_code can be
demultiplexed by origin.

> +       serr->ee.ee_type = 0;
> +       serr->ee.ee_code = code;
> +       serr->ee.ee_pad = 0;
> +       serr->ee.ee_data = (txtime >> 32); /* high part of tstamp */
> +       serr->ee.ee_info = txtime; /* low part of tstamp */
> +
> +       if (sock_queue_err_skb(skb->sk, skb))
> +               kfree_skb(skb);
> +
> +       sock_put(skb->sk);
> +}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-28 14:26   ` Willem de Bruijn
@ 2018-06-28 14:40     ` Willem de Bruijn
  2018-06-28 18:33       ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-28 14:40 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Richard Cochran

On Thu, Jun 28, 2018 at 10:26 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Jun 27, 2018 at 6:08 PM Jesus Sanchez-Palencia
> <jesus.sanchez-palencia@intel.com> wrote:
> >
> > From: Richard Cochran <rcochran@linutronix.de>
> >
> > This patch introduces SO_TXTIME. User space enables this option in
> > order to pass a desired future transmit time in a CMSG when calling
> > sendmsg(2). The argument to this socket option is a 6-bytes long struct
> > defined as:
> >
> > struct sock_txtime {
> >         clockid_t       clockid;
> >         u16             flags;
> > };
>
> clockid_t is __kernel_clockid_t is int is a variable length field.
> Please use fixed length fields.

Sorry, int is fine, of course, and clockid_t is used between userspace and
kernel already.

> Also, as MAX_CLOCKS is 16, only 4 bits are needed. A single u16
> is probably sufficient as cmsg argument. To future proof, a u32 will
> allow for more
> than 4 flags. But in struct sock, 16 bits should be sufficient to
> encode both clock id
> and flags.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf
  2018-06-28 14:26   ` Willem de Bruijn
@ 2018-06-28 17:11     ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-28 17:11 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	ilias.apalodimas, ivan.khoronzhuk, Miroslav Lichvar,
	Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko



On 06/28/2018 07:26 AM, Willem de Bruijn wrote:
> On Wed, Jun 27, 2018 at 8:45 PM Jesus Sanchez-Palencia
> <jesus.sanchez-palencia@intel.com> wrote:
>>
>> The qdisc and the SO_TXTIME ABIs allow for a clockid to be configured,
>> but it's been decided that usage of CLOCK_TAI should be enforced until
>> we decide to allow for other clockids to be used. The rationale here is
>> that PTP times are usually in the TAI scale, thus no other clocks should
>> be necessary.
>>
>> For now, the qdisc will return EINVAL if any clocks other than
>> CLOCK_TAI are used.
>>
>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>> ---
>>  net/sched/sch_etf.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
>> index cd6cb5b69228..5514a8aa3bd5 100644
>> --- a/net/sched/sch_etf.c
>> +++ b/net/sched/sch_etf.c
>> @@ -56,8 +56,8 @@ static inline int validate_input_params(struct tc_etf_qopt *qopt,
>>                 return -ENOTSUPP;
>>         }
>>
>> -       if (qopt->clockid >= MAX_CLOCKS) {
>> -               NL_SET_ERR_MSG(extack, "Invalid clockid");
>> +       if (qopt->clockid != CLOCK_TAI) {
>> +               NL_SET_ERR_MSG(extack, "Invalid clockid. CLOCK_TAI must be used");
> 
> Similar to the comment in patch 12, this should be squashed (into
> patch 6) to avoid incorrect behavior in a range of SHA1s.


Ok. Fixed for v2.

Thanks,
Jesus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready
  2018-06-27 23:56   ` Eric Dumazet
@ 2018-06-28 17:12     ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-28 17:12 UTC (permalink / raw)
  To: Eric Dumazet, netdev
  Cc: tglx, jan.altenberg, vinicius.gomes, kurt.kanzenbach, henrik,
	richardcochran, levi.pearson, ilias.apalodimas, ivan.khoronzhuk,
	mlichvar, willemb, jhs, xiyou.wangcong, jiri



On 06/27/2018 04:56 PM, Eric Dumazet wrote:
> 
> 
> On 06/27/2018 02:59 PM, Jesus Sanchez-Palencia wrote:
>> Currently, skb_tx_timestamp() is being called before the DMA
>> descriptors are prepared in igb_xmit_frame_ring(), which happens
>> during either the igb_tso() or igb_tx_csum() calls.
>>
>> Given that now the skb->tstamp might be used to carry the timestamp
>> for SO_TXTIME, we must only call skb_tx_timestamp() after the
>> information has been copied into the DMA tx_ring.
> 
> 
> Since when this skb->tstamp use happened ?
> 
> If this is in patch 11/14 (igb: Add support for ETF offload), then you should either :
> 
> 1) Squash this into 11/14
> 
> 2) swap 11 and 12 patch, so that this change is done before "igb: Add support for ETF offload"  
> 
> Otherwise a bisection could fail badly.


OK. Fixed for v2 by swapping patches 11 and 12.

Thanks,
Jesus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time.
  2018-06-28 14:40     ` Willem de Bruijn
@ 2018-06-28 18:33       ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-28 18:33 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	Levi Pearson, ilias.apalodimas, ivan.khoronzhuk,
	Miroslav Lichvar, Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Richard Cochran

Hi Willem,


On 06/28/2018 07:40 AM, Willem de Bruijn wrote:
> On Thu, Jun 28, 2018 at 10:26 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>>
>> On Wed, Jun 27, 2018 at 6:08 PM Jesus Sanchez-Palencia
>> <jesus.sanchez-palencia@intel.com> wrote:
>>>
>>> From: Richard Cochran <rcochran@linutronix.de>
>>>
>>> This patch introduces SO_TXTIME. User space enables this option in
>>> order to pass a desired future transmit time in a CMSG when calling
>>> sendmsg(2). The argument to this socket option is a 6-bytes long struct
>>> defined as:
>>>
>>> struct sock_txtime {
>>>         clockid_t       clockid;
>>>         u16             flags;
>>> };
>>
>> clockid_t is __kernel_clockid_t is int is a variable length field.
>> Please use fixed length fields.
> 
> Sorry, int is fine, of course, and clockid_t is used between userspace and
> kernel already.


Great. So, in addition to the other feedback in sock.c, what I'm thinking here
for the v2 is:

- move this struct to and the flags definition (as enums) to
include/uapi/linux/net_tstamp.h;

- keep clockid as a clockid_t and increase flags to u32 since this already takes
8 bytes in total anyway;

- reduce sk_clockid and sk_txtime_flags from struct sock from a u16 to a u8 each.


Thanks,
Jesus



> 
>> Also, as MAX_CLOCKS is 16, only 4 bits are needed. A single u16
>> is probably sufficient as cmsg argument. To future proof, a u32 will
>> allow for more
>> than 4 flags. But in struct sock, 16 bits should be sufficient to
>> encode both clock id
>> and flags.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
  2018-06-28 14:27   ` Willem de Bruijn
@ 2018-06-29 17:48     ` Jesus Sanchez-Palencia
  2018-06-29 18:49       ` Willem de Bruijn
  0 siblings, 1 reply; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-29 17:48 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	ilias.apalodimas, ivan.khoronzhuk, Miroslav Lichvar,
	Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko

Hi Willem,


On 06/28/2018 07:27 AM, Willem de Bruijn wrote:

(...)

> 
>>  struct sock_txtime {
>>         clockid_t       clockid;        /* reference clockid */
>> -       u16             flags;          /* bit 0: txtime in deadline_mode */
>> +       u16             flags;          /* bit 0: txtime in deadline_mode
>> +                                        * bit 1: report drops on sk err queue
>> +                                        */
>>  };
> 
> If this is shared with userspace, should be defined in an uapi header.
> Same on the flag bits below. Self documenting code is preferable over
> comments.


Fixed for v2.


> 
>>  /*
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 73f4404e49e4..e681a45cfe7e 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -473,6 +473,7 @@ struct sock {
>>         u16                     sk_clockid;
>>         u16                     sk_txtime_flags;
>>  #define SK_TXTIME_DEADLINE_MASK        BIT(0)
>> +#define SK_TXTIME_RECV_ERR_MASK        BIT(1)
> 
> Integer bitfields are (arguably) more readable. There is no requirement
> that the user interface be the same as the in-kernel implementation. Indeed
> if you can save bits in struct sock, that is preferable (but not so for the ABI,
> which cannot easily be extended).


Sure, changed for v2.

(...)


>> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
>> index 5514a8aa3bd5..166f4b72875b 100644
>> --- a/net/sched/sch_etf.c
>> +++ b/net/sched/sch_etf.c
>> @@ -11,6 +11,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/string.h>
>>  #include <linux/errno.h>
>> +#include <linux/errqueue.h>
>>  #include <linux/rbtree.h>
>>  #include <linux/skbuff.h>
>>  #include <linux/posix-timers.h>
>> @@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch)
>>         qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
>>  }
>>
>> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
>> +{
>> +       struct sock_exterr_skb *serr;
>> +       ktime_t txtime = skb->tstamp;
>> +
>> +       if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
>> +               return;
>> +
>> +       skb = skb_clone_sk(skb);
>> +       if (!skb)
>> +               return;
>> +
>> +       sock_hold(skb->sk);
> 
> Why take an extra reference? The skb holds a ref on the sk.


Yes, the cloned skb holds a ref on the socket, but the documentation of
skb_clone_sk() makes this explicit suggestion:

(...)
 * When passing buffers allocated with this function to sock_queue_err_skb
 * it is necessary to wrap the call with sock_hold/sock_put in order to
 * prevent the socket from being released prior to being enqueued on
 * the sk_error_queue.
 */

which I believe is here just so we are protected against a possible race after
skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm
misreading anything.

And for v2 I will move the sock_hold() call to immediately before the
sock_queue_err_skb() to avoid any future confusion.



> 
>> +
>> +       serr = SKB_EXT_ERR(skb);
>> +       serr->ee.ee_errno = err;
>> +       serr->ee.ee_origin = SO_EE_ORIGIN_LOCAL;
> 
> I suggest adding a new SO_EE_ORIGIN_TXTIME as opposed to overloading
> the existing
> local origin. Then the EE_CODE can start at 1, as ee_code can be
> demultiplexed by origin.


OK, it looks better indeed. Fixed for v2.


> 
>> +       serr->ee.ee_type = 0;
>> +       serr->ee.ee_code = code;
>> +       serr->ee.ee_pad = 0;
>> +       serr->ee.ee_data = (txtime >> 32); /* high part of tstamp */
>> +       serr->ee.ee_info = txtime; /* low part of tstamp */
>> +
>> +       if (sock_queue_err_skb(skb->sk, skb))
>> +               kfree_skb(skb);
>> +
>> +       sock_put(skb->sk);
>> +}


Thanks,
Jesus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
  2018-06-29 17:48     ` Jesus Sanchez-Palencia
@ 2018-06-29 18:49       ` Willem de Bruijn
  2018-06-29 20:14         ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 31+ messages in thread
From: Willem de Bruijn @ 2018-06-29 18:49 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	ilias.apalodimas, ivan.khoronzhuk, Miroslav Lichvar,
	Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Alexander Duyck

> >> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> >> index 5514a8aa3bd5..166f4b72875b 100644
> >> --- a/net/sched/sch_etf.c
> >> +++ b/net/sched/sch_etf.c
> >> @@ -11,6 +11,7 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/string.h>
> >>  #include <linux/errno.h>
> >> +#include <linux/errqueue.h>
> >>  #include <linux/rbtree.h>
> >>  #include <linux/skbuff.h>
> >>  #include <linux/posix-timers.h>
> >> @@ -124,6 +125,35 @@ static void reset_watchdog(struct Qdisc *sch)
> >>         qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
> >>  }
> >>
> >> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
> >> +{
> >> +       struct sock_exterr_skb *serr;
> >> +       ktime_t txtime = skb->tstamp;
> >> +
> >> +       if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
> >> +               return;
> >> +
> >> +       skb = skb_clone_sk(skb);
> >> +       if (!skb)
> >> +               return;
> >> +
> >> +       sock_hold(skb->sk);
> >
> > Why take an extra reference? The skb holds a ref on the sk.
>
>
> Yes, the cloned skb holds a ref on the socket, but the documentation of
> skb_clone_sk() makes this explicit suggestion:
>
> (...)
>  * When passing buffers allocated with this function to sock_queue_err_skb
>  * it is necessary to wrap the call with sock_hold/sock_put in order to
>  * prevent the socket from being released prior to being enqueued on
>  * the sk_error_queue.
>  */
>
> which I believe is here just so we are protected against a possible race after
> skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm
> misreading anything.

Yes, indeed. Code only has to worry about that if there are no
concurrent references
on the socket.

I may be mistaken, but I believe that this complicated logic exists
only for cases where
the timestamp may be queued after the original skb has been released.
Specifically,
when a tx timestamp is returned from a hardware device after transmission of the
original skb. Then the cloned timestamp skb needs its own reference on
the sk while
it is waiting for the timestamp data (i.e., until the device
completion arrives) and then
we need a temporary extra ref to work around the skb_orphan in
sock_queue_err_skb.

Compare skb_complete_tx_timestamp with skb_tstamp_tx. The second is used in
the regular datapath to clone an skb and queue it on the error queue
immediately,
while holding the original skb. This does not call skb_clone_sk and
does not need the
extra sock_hold. This should be good enough for this code path, too.
As kb holds a
ref on skb->sk, the socket cannot go away in the middle of report_sock_error.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue
  2018-06-29 18:49       ` Willem de Bruijn
@ 2018-06-29 20:14         ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 31+ messages in thread
From: Jesus Sanchez-Palencia @ 2018-06-29 20:14 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Thomas Gleixner, jan.altenberg,
	Vinicius Gomes, kurt.kanzenbach, Henrik Austad, Richard Cochran,
	ilias.apalodimas, ivan.khoronzhuk, Miroslav Lichvar,
	Willem de Bruijn, Jamal Hadi Salim, Cong Wang,
	Jiří Pírko, Alexander Duyck




On 06/29/2018 11:49 AM, Willem de Bruijn wrote:
>>>> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
>>>> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
>>>> +{
>>>> +       struct sock_exterr_skb *serr;
>>>> +       ktime_t txtime = skb->tstamp;
>>>> +
>>>> +       if (!skb->sk || !(skb->sk->sk_txtime_flags & SK_TXTIME_RECV_ERR_MASK))
>>>> +               return;
>>>> +
>>>> +       skb = skb_clone_sk(skb);
>>>> +       if (!skb)
>>>> +               return;
>>>> +
>>>> +       sock_hold(skb->sk);
>>>
>>> Why take an extra reference? The skb holds a ref on the sk.
>>
>>
>> Yes, the cloned skb holds a ref on the socket, but the documentation of
>> skb_clone_sk() makes this explicit suggestion:
>>
>> (...)
>>  * When passing buffers allocated with this function to sock_queue_err_skb
>>  * it is necessary to wrap the call with sock_hold/sock_put in order to
>>  * prevent the socket from being released prior to being enqueued on
>>  * the sk_error_queue.
>>  */
>>
>> which I believe is here just so we are protected against a possible race after
>> skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm
>> misreading anything.
> 
> Yes, indeed. Code only has to worry about that if there are no
> concurrent references
> on the socket.
> 
> I may be mistaken, but I believe that this complicated logic exists
> only for cases where
> the timestamp may be queued after the original skb has been released.
> Specifically,
> when a tx timestamp is returned from a hardware device after transmission of the
> original skb. Then the cloned timestamp skb needs its own reference on
> the sk while
> it is waiting for the timestamp data (i.e., until the device
> completion arrives) and then
> we need a temporary extra ref to work around the skb_orphan in
> sock_queue_err_skb.
> 
> Compare skb_complete_tx_timestamp with skb_tstamp_tx. The second is used in
> the regular datapath to clone an skb and queue it on the error queue
> immediately,
> while holding the original skb. This does not call skb_clone_sk and
> does not need the
> extra sock_hold. This should be good enough for this code path, too.
> As kb holds a
> ref on skb->sk, the socket cannot go away in the middle of report_sock_error.


Oh, that makes sense. Great, I will give this a try and add it to the v2.

Thanks,
Jesus

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-06-29 20:19 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27 21:59 [PATCH v1 net-next 00/14] Scheduled packet Transmission: ETF Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 01/14] net: Clear skb->tstamp only on the forwarding path Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 02/14] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
2018-06-27 22:16   ` Eric Dumazet
2018-06-27 23:07     ` Jesus Sanchez-Palencia
2018-06-28  0:05       ` Eric Dumazet
2018-06-28  2:16   ` kbuild test robot
2018-06-28 14:26   ` Willem de Bruijn
2018-06-28 14:40     ` Willem de Bruijn
2018-06-28 18:33       ` Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 03/14] net: ipv4: Hook into time based transmission Jesus Sanchez-Palencia
2018-06-28 14:26   ` Willem de Bruijn
2018-06-27 21:59 ` [PATCH v1 net-next 04/14] net: packet: " Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 05/14] net/sched: Allow creating a Qdisc watchdog with other clocks Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 06/14] net/sched: Introduce the ETF Qdisc Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 07/14] net/sched: Add HW offloading capability to ETF Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 08/14] igb: Refactor igb_configure_cbs() Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 09/14] igb: Only change Tx arbitration when CBS is on Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 10/14] igb: Refactor igb_offload_cbs() Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 11/14] igb: Add support for ETF offload Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 12/14] igb: Only call skb_tx_timestamp after descriptors are ready Jesus Sanchez-Palencia
2018-06-27 23:56   ` Eric Dumazet
2018-06-28 17:12     ` Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 13/14] net/sched: Enforce usage of CLOCK_TAI for sch_etf Jesus Sanchez-Palencia
2018-06-28 14:26   ` Willem de Bruijn
2018-06-28 17:11     ` Jesus Sanchez-Palencia
2018-06-27 21:59 ` [PATCH v1 net-next 14/14] net/sched: Make etf report drops on error_queue Jesus Sanchez-Palencia
2018-06-28 14:27   ` Willem de Bruijn
2018-06-29 17:48     ` Jesus Sanchez-Palencia
2018-06-29 18:49       ` Willem de Bruijn
2018-06-29 20:14         ` Jesus Sanchez-Palencia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.