netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
@ 2017-09-01  1:26 Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 1/5] net/sched: Introduce the user API for the CBS shaper Vinicius Costa Gomes
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran

Hi,

This patchset is an RFC on a proposal of how the Traffic Control subsystem can
be used to offload the configuration of traffic shapers into network devices
that provide support for them in HW. Our goal here is to start upstreaming
support for features related to the Time-Sensitive Networking (TSN) set of
standards into the kernel.

As part of this work, we've assessed previous public discussions related to TSN
enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Please note that the patches provided as part of this RFC are implementing what
is needed only for 802.1Qav (FQTSS) only, but we'd like to take advantage of
this discussion and share our WIP ideas for the 802.1Qbv and 802.1Qbu interfaces
as well. The current patches are only providing support for HW offload of the
configs.


Overview
========

Time-sensitive Networking (TSN) is a set of standards that aim to address
resources availability for providing bandwidth reservation and bounded latency
on Ethernet based LANs. The proposal described here aims to cover mainly what is
needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
802.1Qbu.

The initial target of this work is the Intel i210 NIC, but other controllers'
datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
the Synopsis DesignWare Ethernet QoS controller.


Proposal
========

Feature-wise, what is covered here are configuration interfaces for HW
implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
(802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
Qbv and Qbu must be configured per port, with the configuration covering all
queues. Given that these features are related to traffic shaping, and that the
traffic control subsystem already provides a queueing discipline that offloads
config into the device driver (i.e. mqprio), designing new qdiscs for the
specific purpose of offloading the config for each shaper seemed like a good
fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues.
The qdisc mqprio is currently used in our tests.

As for the shapers config interface:

 * CBS (802.1Qav)

   This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
   $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
     idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.


 * Time-aware shaper (802.1Qbv):

   The idea we are currently exploring is to add a "time-aware", priority based
   qdisc, that also exposes the Tx queues available and provides a mechanism for
   mapping priority <-> traffic class <-> Tx queues in a similar fashion as
   mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:

   $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
     	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
	   queues 0 1 2 3                                              \
     	   sched-file gates.sched [base-time <interval>]               \
           [cycle-time <interval>] [extension-time <interval>]

   <file> is multi-line, with each line being of the following format:
   <cmd> <gate mask> <interval in nanoseconds>

   Qbv only defines one <cmd>: "S" for 'SetGates'

   For example:

   S 0x01 300
   S 0x03 500

   This means that there are two intervals, the first will have the gate
   for traffic class 0 open for 300 nanoseconds, the second will have
   both traffic classes open for 500 nanoseconds.

   Additionally, an option to set just one entry of the gate control list will
   also be provided by 'taprio':

   $ tc qdisc (...) \
        sched-row <row number> <cmd> <gate mask> <interval>  \
        [base-time <interval>] [cycle-time <interval>] \
        [extension-time <interval>]


 * Frame Preemption (802.1Qbu):

   To control even further the latency, it may prove useful to signal which
   traffic classes are marked as preemptable. For that, 'taprio' provides the
   preemption command so you set each traffic class as preemptable or not:

   $ tc qdisc (...) \
        preemption 0 1 1 1


 * Time-aware shaper + Preemption:

   As an example of how Qbv and Qbu can be used together, we may specify
   both the schedule and the preempt-mask, and this way we may also
   specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as
   specified in the Qbu spec:

   $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \
     	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                    \
	   queues 0 1 2 3                                         \
     	   preemption 0 1 1 1                                     \
	   sched-file preempt_gates.sched

    <file> is multi-line, with each line being of the following format:
    <cmd> <gate mask> <interval in nanoseconds>

    For this case, two new commands are introduced:

    "H" for 'set gates and hold'
    "R" for 'set gates and release'

    H 0x01 300
    R 0x03 500



Testing this RFC
================

For testing the patches of this RFC only, you can refer to the samples and
helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to
configure the HW shaper of the i210 controller:

1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \
     map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g  class show dev enp3s0

Ex.:
+---(802a:3) mqprio
|    +---(802a:6) mqprio
|    +---(802a:7) mqprio
|
+---(802a:2) mqprio
|    +---(802a:5) mqprio
|
+---(802a:1) mqprio
     +---(802a:4) mqprio

 * Here '802a:4' is Tx Queue #0 and '802a:5' is Tx Queue #1.

3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
   for B is 10Mbps:
$ ./samples/tsn/calculate_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500

4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev enp3s0 parent 802a:4 cbs locredit -1470 \
     hicredit 30 sendslope -980000 idleslope 20000

5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev enp3s0 parent 802a:5 cbs \
     locredit -1485 hicredit 31 sendslope -990000 idleslope 10000

6) Run Listener, compiled from samples/tsn/listener.c
$ ./listener -i enp3s0

7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./talker -i enp3s0 -p 3

 * The bandwidth displayed on the listener output at this stage should be very
   close to the one configured for class A.

8) You can also run a Talker for class B (prio 2 here)
$ ./talker -i enp3s0 -p 2

 * The bandwidth displayed on the listener output now should increase to very
   close to the one configured for class A + class B.

Authors
=======
 - Andre Guedes <andre.guedes@intel.com>
 - Ivan Briano <ivan.briano@intel.com>
 - Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
 - Vinicius Gomes <vinicius.gomes@intel.com>


Andre Guedes (2):
  igb: Add support for CBS offload
  samples/tsn: Add script for calculating CBS config

Jesus Sanchez-Palencia (1):
  sample: Add TSN Talker and Listener examples

Vinicius Costa Gomes (2):
  net/sched: Introduce the user API for the CBS shaper
  net/sched: Introduce Credit Based Shaper (CBS) qdisc

 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 349 +++++++++++++++++++++++++
 include/linux/netdevice.h                      |   1 +
 include/uapi/linux/pkt_sched.h                 |  29 ++
 net/sched/Kconfig                              |  11 +
 net/sched/Makefile                             |   1 +
 net/sched/sch_cbs.c                            | 286 ++++++++++++++++++++
 samples/tsn/calculate_cbs_params.py            | 112 ++++++++
 samples/tsn/listener.c                         | 254 ++++++++++++++++++
 samples/tsn/talker.c                           | 136 ++++++++++
 12 files changed, 1216 insertions(+)
 create mode 100644 net/sched/sch_cbs.c
 create mode 100755 samples/tsn/calculate_cbs_params.py
 create mode 100644 samples/tsn/listener.c
 create mode 100644 samples/tsn/talker.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC net-next 1/5] net/sched: Introduce the user API for the CBS shaper
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
@ 2017-09-01  1:26 ` Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc Vinicius Costa Gomes
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran

Export the API necessary for configuring the CBS shaper (implemented
in the next patch) via the tc tool.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..aa4a3e5421be 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,33 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* CBS */
+/* FIXME: this is only for usage with ndo_setup_tc(), this should be
+ * in another header someplace else. Is pkt_cls.h the right place?
+ */
+struct tc_cbs_qopt_offload {
+	u8		enable;
+	s32		queue;
+	s32		hicredit;
+	s32		locredit;
+	s32		idleslope;
+	s32		sendslope;
+};
+
+struct tc_cbs_qopt {
+	__s32		hicredit;
+	__s32		locredit;
+	__s32		idleslope;
+	__s32		sendslope;
+};
+
+enum {
+	TCA_CBS_UNSPEC,
+	TCA_CBS_PARMS,
+	__TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
 #endif
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 1/5] net/sched: Introduce the user API for the CBS shaper Vinicius Costa Gomes
@ 2017-09-01  1:26 ` Vinicius Costa Gomes
  2017-09-08 13:43   ` Henrik Austad
  2017-09-01  1:26 ` [RFC net-next 3/5] igb: Add support for CBS offload Vinicius Costa Gomes
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	andre.guedes, ivan.briano, jesus.sanchez-palencia,
	boon.leong.ong, richardcochran

This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.

It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.

Initially, it only supports offloading the traffic shaping work to
supporting controllers.

Later, when a software implementation is added, the current dependency
on being installed "under" mqprio can be lifted.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 include/linux/netdevice.h |   1 +
 net/sched/Kconfig         |  11 ++
 net/sched/Makefile        |   1 +
 net/sched/sch_cbs.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 299 insertions(+)
 create mode 100644 net/sched/sch_cbs.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 35de8312e0b5..dd9a2ecd0c03 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -775,6 +775,7 @@ enum tc_setup_type {
 	TC_SETUP_CLSFLOWER,
 	TC_SETUP_CLSMATCHALL,
 	TC_SETUP_CLSBPF,
+	TC_SETUP_CBS,
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e70ed26485a2..c03d86a7775e 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -172,6 +172,17 @@ config NET_SCH_TBF
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_tbf.
 
+config NET_SCH_CBS
+	tristate "Credit Based Shaper (CBS)"
+	---help---
+	  Say Y here if you want to use the Credit Based Shaper (CBS) packet
+	  scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_cbs.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_cbs.
+
 config NET_SCH_GRED
 	tristate "Generic Random Early Detection (GRED)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 7b915d226de7..80c8f92d162d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
 obj-$(CONFIG_NET_SCH_FQ)	+= sch_fq.o
 obj-$(CONFIG_NET_SCH_HHF)	+= sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)	+= sch_pie.o
+obj-$(CONFIG_NET_SCH_CBS)	+= sch_cbs.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
new file mode 100644
index 000000000000..1c86a9e14150
--- /dev/null
+++ b/net/sched/sch_cbs.c
@@ -0,0 +1,286 @@
+/*
+ * net/sched/sch_cbs.c	Credit Based Shaper
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Vininicius Costa Gomes <vinicius.gomes@intel.com>
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <net/netlink.h>
+#include <net/sch_generic.h>
+#include <net/pkt_sched.h>
+
+struct cbs_sched_data {
+	struct Qdisc *qdisc; /* Inner qdisc, default - pfifo queue */
+	s32 queue;
+	s32 locredit;
+	s32 hicredit;
+	s32 sendslope;
+	s32 idleslope;
+};
+
+static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
+		       struct sk_buff **to_free)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	int ret;
+
+	ret = qdisc_enqueue(skb, q->qdisc, to_free);
+	if (ret != NET_XMIT_SUCCESS) {
+		if (net_xmit_drop_count(ret))
+			qdisc_qstats_drop(sch);
+		return ret;
+	}
+
+	qdisc_qstats_backlog_inc(sch, skb);
+	sch->q.qlen++;
+	return NET_XMIT_SUCCESS;
+}
+
+static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	skb = q->qdisc->ops->peek(q->qdisc);
+	if (skb) {
+		skb = qdisc_dequeue_peeked(q->qdisc);
+		if (unlikely(!skb))
+			return NULL;
+
+		qdisc_qstats_backlog_dec(sch, skb);
+		sch->q.qlen--;
+		qdisc_bstats_update(sch, skb);
+
+		return skb;
+	}
+	return NULL;
+}
+
+static void cbs_reset(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+}
+
+static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
+	[TCA_CBS_PARMS]	= { .len = sizeof(struct tc_cbs_qopt) },
+};
+
+static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct tc_cbs_qopt_offload cbs = { };
+	struct nlattr *tb[TCA_CBS_MAX + 1];
+	const struct net_device_ops *ops;
+	struct tc_cbs_qopt *qopt;
+	struct net_device *dev;
+	int err;
+
+	err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
+	if (err < 0)
+		return err;
+
+	err = -EINVAL;
+	if (!tb[TCA_CBS_PARMS])
+		goto done;
+
+	qopt = nla_data(tb[TCA_CBS_PARMS]);
+
+	dev = qdisc_dev(sch);
+	ops = dev->netdev_ops;
+
+	/* FIXME: this means that we can only install this qdisc
+	 * "under" mqprio. Do we need a more generic way to retrieve
+	 * the queue, or do we pass the netdev_queue to the driver?
+	 */
+	cbs.queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
+
+	cbs.enable = 1;
+	cbs.hicredit = qopt->hicredit;
+	cbs.locredit = qopt->locredit;
+	cbs.idleslope = qopt->idleslope;
+	cbs.sendslope = qopt->sendslope;
+
+	err = -ENOTSUPP;
+	if (!ops->ndo_setup_tc)
+		goto done;
+
+	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+	if (err < 0)
+		goto done;
+
+	q->queue = cbs.queue;
+	q->hicredit = cbs.hicredit;
+	q->locredit = cbs.locredit;
+	q->idleslope = cbs.idleslope;
+	q->sendslope = cbs.sendslope;
+
+done:
+	return err;
+}
+
+static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	if (!opt)
+		return -EINVAL;
+
+	q->qdisc = fifo_create_dflt(sch, &pfifo_qdisc_ops, 1024);
+	qdisc_hash_add(q->qdisc, true);
+
+	return cbs_change(sch, opt);
+}
+
+static void cbs_destroy(struct Qdisc *sch)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct tc_cbs_qopt_offload cbs = { };
+	struct net_device *dev;
+	int err;
+
+	q->hicredit = 0;
+	q->locredit = 0;
+	q->idleslope = 0;
+	q->sendslope = 0;
+
+	dev = qdisc_dev(sch);
+
+	cbs.queue = q->queue;
+	cbs.enable = 0;
+
+	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
+	if (err < 0)
+		pr_warn("Couldn't reset queue %d to default values\n",
+			cbs.queue);
+
+	qdisc_destroy(q->qdisc);
+}
+
+static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+	struct nlattr *nest;
+	struct tc_cbs_qopt opt;
+
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+
+	opt.hicredit = q->hicredit;
+	opt.locredit = q->locredit;
+	opt.sendslope = q->sendslope;
+	opt.idleslope = q->idleslope;
+
+	if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, nest);
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static int cbs_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	tcm->tcm_handle |= TC_H_MIN(1);
+	tcm->tcm_info = q->qdisc->handle;
+
+	return 0;
+}
+
+static int cbs_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	if (!new)
+		new = &noop_qdisc;
+
+	*old = qdisc_replace(sch, new, &q->qdisc);
+	return 0;
+}
+
+static struct Qdisc *cbs_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct cbs_sched_data *q = qdisc_priv(sch);
+
+	return q->qdisc;
+}
+
+static unsigned long cbs_find(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static int cbs_delete(struct Qdisc *sch, unsigned long arg)
+{
+	return 0;
+}
+
+static void cbs_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static const struct Qdisc_class_ops cbs_class_ops = {
+	.graft		=	cbs_graft,
+	.leaf		=	cbs_leaf,
+	.find		=	cbs_find,
+	.delete		=	cbs_delete,
+	.walk		=	cbs_walk,
+	.dump		=	cbs_dump_class,
+};
+
+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
+	.next		=	NULL,
+	.cl_ops		=	&cbs_class_ops,
+	.id		=	"cbs",
+	.priv_size	=	sizeof(struct cbs_sched_data),
+	.enqueue	=	cbs_enqueue,
+	.dequeue	=	cbs_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.init		=	cbs_init,
+	.reset		=	cbs_reset,
+	.destroy	=	cbs_destroy,
+	.change		=	cbs_change,
+	.dump		=	cbs_dump,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init cbs_module_init(void)
+{
+	return register_qdisc(&cbs_qdisc_ops);
+}
+
+static void __exit cbs_module_exit(void)
+{
+	unregister_qdisc(&cbs_qdisc_ops);
+}
+module_init(cbs_module_init)
+module_exit(cbs_module_exit)
+MODULE_LICENSE("GPL");
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC net-next 3/5] igb: Add support for CBS offload
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 1/5] net/sched: Introduce the user API for the CBS shaper Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc Vinicius Costa Gomes
@ 2017-09-01  1:26 ` Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 4/5] sample: Add TSN Talker and Listener examples Vinicius Costa Gomes
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Andre Guedes, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

From: Andre Guedes <andre.guedes@intel.com>

This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.

The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.

FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.

FQTSS feature is supported by i210 controller only.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 349 +++++++++++++++++++++++++
 4 files changed, 386 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
 #define E1000_RXPBS_CFG_TS_EN           0x80000000
 
 #define I210_RXPBSIZE_DEFAULT		0x000000A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK		0x0000003F
+#define I210_RXPBSIZE_PB_32KB		0x00000020
 #define I210_TXPBSIZE_DEFAULT		0x04000014 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK		0xC0FFFFFF
+#define I210_TXPBSIZE_PB0_8KB		(8 << 0)
+#define I210_TXPBSIZE_PB1_8KB		(8 << 6)
+#define I210_TXPBSIZE_PB2_4KB		(4 << 12)
+#define I210_TXPBSIZE_PB3_4KB		(4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT		0x00000098
+
+#define I210_SR_QUEUES_NUM		2
 
 /* SerDes Control */
 #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
 #define E1000_VLAPQF_P_VALID(_n)	(0x1 << (3 + (_n) * 4))
 #define E1000_VLAPQF_QUEUE_MASK	0x03
 
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE	BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARB	BIT(4)
+#define E1000_TQAVCTRL_DATATRANARB	BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK	0xFFFF
+#define E1000_TQAVCC_QUEUEMODE		BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY		BIT(27)
+
 #endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
 
 #define E1000_I210_FLA		0x1201C
 
+#define E1000_I210_DTXMXPKTSZ	0x355C
+
+#define E1000_I210_TXDCTL(_n)	(0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL	0x3570
+#define E1000_I210_TQAVCC(_n)	(0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n)	(0x300C + ((_n) * 0x40))
+
 #define E1000_INVM_DATA_REG(_n)	(0x12120 + 4*(_n))
 #define E1000_INVM_SIZE		64 /* Number of INVM Data Registers */
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
 	u16 count;			/* number of desc. in the ring */
 	u8 queue_index;			/* logical index of the ring*/
 	u8 reg_idx;			/* physical index of the ring */
+	bool cbs_enable;		/* indicates if CBS is enabled */
+	s32 idleslope;			/* idleSlope in kbps */
+	s32 sendslope;			/* sendSlope in kbps */
+	s32 hicredit;			/* hiCredit in bytes */
+	s32 locredit;			/* loCredit in bytes */
 
 	/* everything past this point are written often */
 	u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
 #define IGB_FLAG_EEE			BIT(14)
 #define IGB_FLAG_VLAN_PROMISC		BIT(15)
 #define IGB_FLAG_RX_LEGACY		BIT(16)
+#define IGB_FLAG_FQTSS			BIT(17)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0		0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index fd4a46b03cc8..47cabca0c99a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -62,6 +62,17 @@
 #define BUILD 0
 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \
 __stringify(BUILD) "-k"
+
+enum queue_mode {
+	QUEUE_MODE_STRICT_PRIORITY,
+	QUEUE_MODE_STREAM_RESERVATION,
+};
+
+enum tx_queue_prio {
+	TX_QUEUE_PRIO_HIGH,
+	TX_QUEUE_PRIO_LOW,
+};
+
 char igb_driver_name[] = "igb";
 char igb_driver_version[] = DRV_VERSION;
 static const char igb_driver_string[] =
@@ -1271,6 +1282,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
 		ring->count = adapter->tx_ring_count;
 		ring->queue_index = txr_idx;
 
+		ring->cbs_enable = false;
+		ring->idleslope = 0;
+		ring->sendslope = 0;
+		ring->hicredit = 0;
+		ring->locredit = 0;
+
 		u64_stats_init(&ring->tx_syncp);
 		u64_stats_init(&ring->tx_syncp2);
 
@@ -1598,6 +1615,292 @@ static void igb_get_hw_control(struct igb_adapter *adapter)
 			ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
 }
 
+static void enable_fqtss(struct igb_adapter *adapter, bool enable)
+{
+	struct net_device *netdev = adapter->netdev;
+
+	if (enable)
+		adapter->flags |= IGB_FLAG_FQTSS;
+	else
+		adapter->flags &= ~IGB_FLAG_FQTSS;
+
+	if (netif_running(netdev))
+		schedule_work(&adapter->reset_task);
+	else
+		igb_reset(adapter);
+}
+
+static bool is_fqtss_enabled(struct igb_adapter *adapter)
+{
+	return (adapter->flags & IGB_FLAG_FQTSS) ? true : false;
+}
+
+static int set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue,
+				  enum tx_queue_prio prio)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+
+	if (queue < 0 || queue > 4)
+		return -EINVAL;
+
+	val = rd32(E1000_I210_TXDCTL(queue));
+
+	if (prio == TX_QUEUE_PRIO_HIGH)
+		val |= E1000_TXDCTL_PRIORITY;
+	else
+		val &= ~E1000_TXDCTL_PRIORITY;
+
+	wr32(E1000_I210_TXDCTL(queue), val);
+	return 0;
+}
+
+static int set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+
+	/* Stream reservation is only supported for queue 0 and 1. */
+	if (queue < 0 || queue > 1)
+		return -EINVAL;
+
+	val = rd32(E1000_I210_TQAVCC(queue));
+
+	if (mode == QUEUE_MODE_STREAM_RESERVATION)
+		val |= E1000_TQAVCC_QUEUEMODE;
+	else
+		val &= ~E1000_TQAVCC_QUEUEMODE;
+
+	wr32(E1000_I210_TQAVCC(queue), val);
+	return 0;
+}
+
+/**
+ *  igb_configure_cbs - Configure Credit-Based Shaper (CBS)
+ *  @adapter: pointer to adapter struct
+ *  @queue: queue number
+ *  @enable: true = enable CBS, false = disable CBS
+ *  @idleslope: idleSlope in kbps
+ *  @sendslope: sendSlope in kbps
+ *  @hicredit: hiCredit in bytes
+ *  @locredit: loCredit in bytes
+ *
+ *  Configure CBS for a given hardware queue. When disabling, idleslope,
+ *  sendslope, hicredit, locredit arguments are ignored. Returns 0 if
+ *  success. Negative otherwise.
+ **/
+static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
+			      bool enable, int idleslope, int sendslope,
+			      int hicredit, int locredit)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 tqavcc;
+	u16 value;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+	WARN_ON(adapter->num_tx_queues < 2);
+
+	if (enable) {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
+		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
+
+		/* According to i210 datasheet section 7.2.7.7, we should set
+		 * the 'idleSlope' field from TQAVCC register following the
+		 * equation:
+		 *
+		 * For 100 Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 0.2                          (E1)
+		 *
+		 * For 1000Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 2                            (E2)
+		 *
+		 * E1 and E2 can be merged into one equation as shown below.
+		 * Note that 'link-speed' is in Mbps.
+		 *
+		 *     value = BW * 0x7735 * 2 * link-speed
+		 *                           --------------               (E3)
+		 *                                1000
+		 *
+		 * 'BW' is the percentage bandwidth out of full link speed
+		 * which can be found with the following equation. Note that
+		 * idleSlope here is the parameter from this function which
+		 * is in kbps.
+		 *
+		 *     BW =     idleSlope
+		 *          -----------------                             (E4)
+		 *          link-speed * 1000
+		 *
+		 * That said, we can come up with a generic equation to
+		 * calculate the value we should set it TQAVCC register by
+		 * replacing 'BW' in E3 by E4. The resulting equation is:
+		 *
+		 * value =     idleSlope     * 0x7735 * 2 * link-speed
+		 *         -----------------            --------------    (E5)
+		 *         link-speed * 1000                 1000
+		 *
+		 * 'link-speed' is present in both sides of the fraction so
+		 * it is canceled out. The final equation is the following:
+		 *
+		 *     value = idleSlope * 61034
+		 *             -----------------                          (E6)
+		 *                  1000000
+		 */
+		value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000);
+
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		tqavcc |= value;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735);
+	} else {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
+		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
+
+		/* Set idleSlope to zero. */
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		/* Set hiCredit to zero. */
+		wr32(E1000_I210_TQAVHC(queue), 0);
+	}
+
+	/* XXX: In i210 controller the sendSlope and loCredit
+	 * parameters from CBS are not configurable by software so we
+	 * don't do any 'controller configuration' in respect to these
+	 * parameters.
+	 */
+
+	netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d "
+		   "hiCredit %d locredit %d\n",
+		   (enable) ? "enabled" : "disabled", queue,
+		   idleslope, sendslope, hicredit, locredit);
+}
+
+static void igb_save_cbs_params(struct igb_adapter *adapter, int queue,
+				bool enable, int idleslope, int sendslope,
+				int hicredit, int locredit)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	struct igb_ring *ring;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+	WARN_ON(adapter->num_tx_queues < 2);
+
+	ring = adapter->tx_ring[queue];
+
+	ring->cbs_enable = enable;
+	ring->idleslope = idleslope;
+	ring->sendslope = sendslope;
+	ring->hicredit = hicredit;
+	ring->locredit = locredit;
+}
+
+static bool is_any_cbs_enabled(struct igb_adapter *adapter)
+{
+	struct igb_ring *ring;
+	int i;
+
+	WARN_ON(adapter->num_tx_queues < 2);
+
+	for (i = 0; i < I210_SR_QUEUES_NUM; i++) {
+		ring = adapter->tx_ring[i];
+
+		if (ring->cbs_enable)
+			return true;
+	}
+
+	return false;
+}
+
+static void igb_setup_tx_mode(struct igb_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 val;
+	int i;
+
+	/* Only i210 controller supports changing the transmission mode. */
+	if (hw->mac.type != e1000_i210)
+		return;
+
+	if (is_fqtss_enabled(adapter)) {
+		/* Configure TQAVCTRL register: set transmit mode to 'Qav',
+		 * set data fetch arbitration to 'round robin' and set data
+		 * transfer arbitration to 'credit shaper algorithm.
+		 */
+		val = rd32(E1000_I210_TQAVCTRL);
+		val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB;
+		val &= ~E1000_TQAVCTRL_DATAFETCHARB;
+		wr32(E1000_I210_TQAVCTRL, val);
+
+		/* Configure Tx and Rx packet buffers sizes as described in
+		 * i210 datasheet section 7.2.7.7.
+		 */
+		val = rd32(E1000_TXPBS);
+		val &= ~I210_TXPBSIZE_MASK;
+		val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB |
+			I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB;
+		wr32(E1000_TXPBS, val);
+
+		val = rd32(E1000_RXPBS);
+		val &= ~I210_RXPBSIZE_MASK;
+		val |= I210_RXPBSIZE_PB_32KB;
+		wr32(E1000_RXPBS, val);
+
+		/* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ
+		 * register should not exceed the buffer size programmed in
+		 * TXPBS. The smallest buffer size programmed in TXPBS is 4kB
+		 * so according to the datasheet we should set MAX_TPKT_SIZE to
+		 * 4kB / 64.
+		 *
+		 * However, when we do so, no frame from queue 2 and 3 are
+		 * transmitted.  It seems the MAX_TPKT_SIZE should not be great
+		 * or _equal_ to the buffer size programmed in TXPBS. For this
+		 * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64.
+		 */
+		val = (4096 - 1) / 64;
+		wr32(E1000_I210_DTXMXPKTSZ, val);
+
+		/* Since FQTSS mode is enabled, apply any CBS configuration
+		 * previously set. If no previous CBS configuration has been
+		 * done, then the initial configuration is applied, which means
+		 * CBS is disabled. CBS configuration is only supported by SR
+		 * queues i.e. queue 0 and queue 1.
+		 */
+		for (i = 0; i < I210_SR_QUEUES_NUM; i++) {
+			struct igb_ring *ring = adapter->tx_ring[i];
+
+			igb_configure_cbs(adapter, i, ring->cbs_enable,
+					  ring->idleslope, ring->sendslope,
+					  ring->hicredit, ring->locredit);
+		}
+	} else {
+		wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT);
+		wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT);
+		wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT);
+
+		val = rd32(E1000_I210_TQAVCTRL);
+		/* According to Section 8.12.21, the other flags we've set when
+		 * enabling FQTSS are not relevant when disabling FQTSS so we
+		 * don't set they here.
+		 */
+		val &= ~E1000_TQAVCTRL_XMIT_MODE;
+		wr32(E1000_I210_TQAVCTRL, val);
+	}
+
+	netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ?
+		   "enabled" : "disabled");
+}
+
 /**
  *  igb_configure - configure the hardware for RX and TX
  *  @adapter: private board structure
@@ -1609,6 +1912,7 @@ static void igb_configure(struct igb_adapter *adapter)
 
 	igb_get_hw_control(adapter);
 	igb_set_rx_mode(netdev);
+	igb_setup_tx_mode(adapter);
 
 	igb_restore_vlan(adapter);
 
@@ -2150,6 +2454,50 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	return features;
 }
 
+static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
+			void *type_data)
+{
+	struct igb_adapter *adapter = netdev_priv(dev);
+	struct e1000_hw *hw = &adapter->hw;
+	struct tc_cbs_qopt_offload *cbs;
+
+	if (hw->mac.type != e1000_i210)
+		return -ENOTSUPP;
+
+	if (type != TC_SETUP_CBS)
+		return -ENOTSUPP;
+
+	/* In order to support FQTSS feature, we must have at least 2 Tx
+	 * queues enabled.
+	 */
+	if (adapter->num_tx_queues < 2)
+		return -ENOTSUPP;
+
+	cbs = type_data;
+
+	/* Only queues 0 and 1 support CBS configuration. */
+	if (cbs->queue < 0 || cbs->queue > 1)
+		return -EINVAL;
+
+	igb_save_cbs_params(adapter, cbs->queue, cbs->enable,
+			    cbs->idleslope, cbs->sendslope,
+			    cbs->hicredit, cbs->locredit);
+
+	if (is_fqtss_enabled(adapter)) {
+		igb_configure_cbs(adapter, cbs->queue, cbs->enable,
+				  cbs->idleslope, cbs->sendslope,
+				  cbs->hicredit, cbs->locredit);
+
+		if (!is_any_cbs_enabled(adapter))
+			enable_fqtss(adapter, false);
+
+	} else {
+		enable_fqtss(adapter, true);
+	}
+
+	return 0;
+}
+
 static const struct net_device_ops igb_netdev_ops = {
 	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
@@ -2175,6 +2523,7 @@ static const struct net_device_ops igb_netdev_ops = {
 	.ndo_set_features	= igb_set_features,
 	.ndo_fdb_add		= igb_ndo_fdb_add,
 	.ndo_features_check	= igb_features_check,
+	.ndo_setup_tc		= igb_setup_tc,
 };
 
 /**
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC net-next 4/5] sample: Add TSN Talker and Listener examples
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (2 preceding siblings ...)
  2017-09-01  1:26 ` [RFC net-next 3/5] igb: Add support for CBS offload Vinicius Costa Gomes
@ 2017-09-01  1:26 ` Vinicius Costa Gomes
  2017-09-01  1:26 ` [RFC net-next 5/5] samples/tsn: Add script for calculating CBS config Vinicius Costa Gomes
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Jesus Sanchez-Palencia, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong,
	richardcochran, Vinicius Costa Gomes

From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

Add two examples so one can easily test a 'TSN distributed system'
running with standard kernel interfaces. Both 'talker' and 'listener'
sides are provided, and use a AF_PACKET for Tx / Rx of frames.

Running the examples is rather simple.
For the talker, just the interface and SO_PRIORITY are expected as
parameters:

$ ./talker -i enp3s0 -p 3

For the listener, only the interface is needed:

$ ./listener -i enp3s0

The multicast MAC address being used is currently hardcoded on both
examples for simplicity.

Note that the listener side uses a BPF filter so only frames sent to
the correct "stream" destination address are received by the socket.
If you modify the address used by the talker, you must also adapt
the BPF filter otherwise no frames will be received by the socket.

The listener example will print the rate of packets reception after
every 1 second. This makes it easier to verify if the bandwidth
configured for a traffic class is being respected or not.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Iván Briano <ivan.briano@intel.com>
---
 samples/tsn/listener.c | 254 +++++++++++++++++++++++++++++++++++++++++++++++++
 samples/tsn/talker.c   | 136 ++++++++++++++++++++++++++
 2 files changed, 390 insertions(+)
 create mode 100644 samples/tsn/listener.c
 create mode 100644 samples/tsn/talker.c

diff --git a/samples/tsn/listener.c b/samples/tsn/listener.c
new file mode 100644
index 000000000000..2d17bdfbea99
--- /dev/null
+++ b/samples/tsn/listener.c
@@ -0,0 +1,254 @@
+/*
+ * Copyright (c) 2017, Intel Corporation
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ *     * Redistributions of source code must retain the above copyright notice,
+ *       this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Intel Corporation nor the names of its contributors
+ *       may be used to endorse or promote products derived from this software
+ *       without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <alloca.h>
+#include <argp.h>
+#include <arpa/inet.h>
+#include <inttypes.h>
+#include <linux/filter.h>
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <poll.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/timerfd.h>
+#include <unistd.h>
+
+#define MAX_FRAME_SIZE 1500
+
+/* XXX: If this address is changed, the BPF filter must be adjusted. */
+static uint8_t multicast_macaddr[] = { 0xBB, 0xAA, 0xBB, 0xAA, 0xBB, 0xAA };
+static char ifname[IFNAMSIZ];
+static uint64_t data_count;
+static int arg_count;
+
+/*
+ * BPF Filter so we only receive frames from the destination MAC address of our
+ * SRP stream. This is hardcoded in multicast_macaddr[].
+ */
+static struct sock_filter dst_addr_filter[] = {
+	{ 0x20,  0,  0, 0000000000 }, /* Load DST address: first 32bits only */
+	{ 0x15,  0,  3, 0xbbaabbaa }, /* Compare with first 32bits from MAC */
+	{ 0x28,  0,  0, 0x00000004 }, /* Load DST address: remaining 16bits */
+	{ 0x15,  0,  1, 0x0000bbaa }, /* Compare with last 16bits from MAC */
+	{ 0x06,  0,  0, 0xffffffff },
+	{ 0x06,  0,  0, 0000000000 }, /* Ret 0. Jump here if any mismatches. */
+};
+
+/* BPF program */
+static struct sock_fprog bpf = {
+	.len = 6, /* Number of instructions on BPF filter */
+	.filter = dst_addr_filter,
+};
+
+static struct argp_option options[] = {
+	{"ifname", 'i', "IFNAME", 0, "Network Interface" },
+	{ 0 }
+};
+
+static error_t parser(int key, char *arg, struct argp_state *s)
+{
+	switch (key) {
+	case 'i':
+		strncpy(ifname, arg, sizeof(ifname) - 1);
+		arg_count++;
+		break;
+	case ARGP_KEY_END:
+		if (arg_count < 1)
+			argp_failure(s, 1, 0, "Options missing. Check --help");
+		break;
+	}
+
+	return 0;
+}
+
+static struct argp argp = { options, parser };
+
+static int setup_1s_timer(void)
+{
+	struct itimerspec tspec = { 0 };
+	int fd, res;
+
+	fd = timerfd_create(CLOCK_MONOTONIC, 0);
+	if (fd < 0) {
+		perror("Couldn't create timer");
+		return -1;
+	}
+
+	tspec.it_value.tv_sec = 1;
+	tspec.it_interval.tv_sec = 1;
+
+	res = timerfd_settime(fd, 0, &tspec, NULL);
+	if (res < 0) {
+		perror("Couldn't set timer");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int setup_socket(void)
+{
+	struct sockaddr_ll sk_addr = {
+		.sll_family = AF_PACKET,
+		.sll_protocol = htons(ETH_P_TSN),
+	};
+	struct packet_mreq mreq;
+	struct ifreq req;
+	int fd, res;
+
+	fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_TSN));
+	if (fd < 0) {
+		perror("Couldn't open socket");
+		return -1;
+	}
+
+	res = setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
+	if (res < 0) {
+		perror("Couldn't attach bpf filter");
+		goto err;
+	}
+
+	strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
+	res = ioctl(fd, SIOCGIFINDEX, &req);
+	if (res < 0) {
+		perror("Couldn't get interface index");
+		goto err;
+	}
+
+	sk_addr.sll_ifindex = req.ifr_ifindex;
+	res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr));
+	if (res < 0) {
+		perror("Couldn't bind() to interface");
+		goto err;
+	}
+
+	/* Use PACKET_ADD_MEMBERSHIP to add a binding to the Multicast Addr */
+	mreq.mr_ifindex = sk_addr.sll_ifindex;
+	mreq.mr_type = PACKET_MR_MULTICAST;
+	mreq.mr_alen = ETH_ALEN;
+	memcpy(&mreq.mr_address, multicast_macaddr, ETH_ALEN);
+
+	res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
+				&mreq, sizeof(struct packet_mreq));
+	if (res < 0) {
+		perror("Couldn't set PACKET_ADD_MEMBERSHIP");
+		goto err;
+	}
+
+	return fd;
+
+err:
+	close(fd);
+	return -1;
+}
+
+static void recv_packet(int fd)
+{
+	uint8_t *data = alloca(MAX_FRAME_SIZE);
+	ssize_t n = recv(fd, data, MAX_FRAME_SIZE, 0);
+
+	if (n < 0) {
+		perror("Failed to receive data");
+		return;
+	}
+
+	if (n != MAX_FRAME_SIZE)
+		printf("Size mismatch: expected %d, got %zd\n",
+		       MAX_FRAME_SIZE, n);
+
+	data_count += n;
+}
+
+static void report_bw(int fd)
+{
+	uint64_t expirations;
+	ssize_t n = read(fd, &expirations, sizeof(uint64_t));
+
+	if (n < 0) {
+		perror("Couldn't read timerfd");
+		return;
+	}
+
+	if (expirations != 1)
+		printf("Something went wrong with timerfd\n");
+
+	/* Report how much data was received in 1s. */
+	printf("Data rate: %zu kbps\n", (data_count * 8) / 1000);
+
+	data_count = 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int sk_fd, timer_fd, res;
+	struct pollfd fds[2];
+
+	argp_parse(&argp, argc, argv, 0, NULL, NULL);
+
+	sk_fd = setup_socket();
+	if (sk_fd < 0)
+		return 1;
+
+	timer_fd = setup_1s_timer();
+	if (timer_fd < 0) {
+		close(sk_fd);
+		return 1;
+	}
+
+	fds[0].fd = sk_fd;
+	fds[0].events = POLLIN;
+	fds[1].fd = timer_fd;
+	fds[1].events = POLLIN;
+
+	printf("Waiting for packets...\n");
+
+	while (1) {
+		res = poll(fds, 2, -1);
+		if (res < 0) {
+			perror("Error on poll()");
+			goto err;
+		}
+
+		if (fds[0].revents & POLLIN)
+			recv_packet(fds[0].fd);
+
+		if (fds[1].revents & POLLIN)
+			report_bw(fds[1].fd);
+	}
+
+err:
+	close(timer_fd);
+	close(sk_fd);
+	return 1;
+}
diff --git a/samples/tsn/talker.c b/samples/tsn/talker.c
new file mode 100644
index 000000000000..35e6f99b48f6
--- /dev/null
+++ b/samples/tsn/talker.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2017, Intel Corporation
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ *     * Redistributions of source code must retain the above copyright notice,
+ *       this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of Intel Corporation nor the names of its contributors
+ *       may be used to endorse or promote products derived from this software
+ *       without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <alloca.h>
+#include <argp.h>
+#include <arpa/inet.h>
+#include <inttypes.h>
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <unistd.h>
+
+#define MAX_FRAME_SIZE 1500
+
+static uint8_t multicast_macaddr[] = { 0xBB, 0xAA, 0xBB, 0xAA, 0xBB, 0xAA };
+static char ifname[IFNAMSIZ];
+static int prio = -1;
+static int arg_count;
+
+static struct argp_option options[] = {
+	{"ifname", 'i', "IFNAME", 0, "Network Interface" },
+	{"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" },
+	{ 0 }
+};
+
+static error_t parser(int key, char *arg, struct argp_state *s)
+{
+	switch (key) {
+	case 'i':
+		strncpy(ifname, arg, sizeof(ifname) - 1);
+		arg_count++;
+		break;
+	case 'p':
+		prio = atoi(arg);
+		if (prio < 0)
+			argp_failure(s, 1, 0, "Priority must be >=0\n");
+		arg_count++;
+		break;
+	case ARGP_KEY_END:
+		if (arg_count < 2)
+			argp_failure(s, 1, 0,
+				     "Options missing. Check --help\n");
+		break;
+	}
+
+	return 0;
+}
+
+static struct argp argp = { options, parser };
+
+int main(int argc, char *argv[])
+{
+	struct sockaddr_ll dst_ll_addr = {
+		.sll_family = AF_PACKET,
+		.sll_protocol = htons(ETH_P_TSN),
+		.sll_halen = ETH_ALEN,
+	};
+	struct ifreq req;
+	uint8_t *payload;
+	int fd, res;
+
+	argp_parse(&argp, argc, argv, 0, NULL, NULL);
+
+	fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
+	if (fd < 0) {
+		perror("Couldn't open socket");
+		return 1;
+	}
+
+	strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
+	res = ioctl(fd, SIOCGIFINDEX, &req);
+	if (res < 0) {
+		perror("Couldn't get interface index");
+		goto err;
+	}
+
+	dst_ll_addr.sll_ifindex = req.ifr_ifindex;
+	memcpy(&dst_ll_addr.sll_addr, multicast_macaddr, ETH_ALEN);
+
+	res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &prio, sizeof(prio));
+	if (res < 0) {
+		perror("Couldn't set priority");
+		goto err;
+	}
+
+	payload = alloca(MAX_FRAME_SIZE);
+	memset(payload, 0xBE, MAX_FRAME_SIZE);
+
+	printf("Sending packets...\n");
+
+	while (1) {
+		ssize_t n = sendto(fd, payload, MAX_FRAME_SIZE, 0,
+				(struct sockaddr *) &dst_ll_addr,
+				sizeof(dst_ll_addr));
+
+		if (n < 0)
+			perror("Failed to send data");
+
+		/* Sleep for 500us to avoid starvation from a 20Mbps stream. */
+		usleep(500);
+	}
+
+err:
+	close(fd);
+	return 1;
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC net-next 5/5] samples/tsn: Add script for calculating CBS config
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (3 preceding siblings ...)
  2017-09-01  1:26 ` [RFC net-next 4/5] sample: Add TSN Talker and Listener examples Vinicius Costa Gomes
@ 2017-09-01  1:26 ` Vinicius Costa Gomes
  2017-09-01 13:03 ` [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Richard Cochran
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-01  1:26 UTC (permalink / raw)
  To: netdev
  Cc: Andre Guedes, jhs, xiyou.wangcong, jiri, intel-wired-lan,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

From: Andre Guedes <andre.guedes@intel.com>

Add a script that takes as input the parameters of the Credit-based
shaper used on FQTSS - link rate, max frame size of best effort
traffic, idleslope and maximum frame size of the time-sensitive
traffic class - for SR classes A and B, and calculates how the CBS
qdisc must be configured for each traffic class.

For example, if you want to have Class A with a bandwidth of 300 Mbps
and Class B of 200 Mbps, and the max frame size of both classes'
traffic is 1500 bytes:

$ ./calculate_cbs_params.py -A 300000 -a 1500 -B 200000 -b 1500

would give you the correct cbs qdisc config command lines to be used.

This script is just a helper to ease testing of the TSN samples -
talker and listener - and shouldn't be taken as highly accurate.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 samples/tsn/calculate_cbs_params.py | 112 ++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)
 create mode 100755 samples/tsn/calculate_cbs_params.py

diff --git a/samples/tsn/calculate_cbs_params.py b/samples/tsn/calculate_cbs_params.py
new file mode 100755
index 000000000000..9c46210b699f
--- /dev/null
+++ b/samples/tsn/calculate_cbs_params.py
@@ -0,0 +1,112 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2017, Intel Corporation
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+#     * Redistributions of source code must retain the above copyright notice,
+#       this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in the
+#       documentation and/or other materials provided with the distribution.
+#     * Neither the name of Intel Corporation nor the names of its contributors
+#       may be used to endorse or promote products derived from this software
+#       without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import argparse
+import math
+import sys
+
+def print_cbs_params_for_class_a(args):
+    idleslope = args.idleslope_a
+    sendslope = idleslope - args.link_speed
+
+    # According to 802.1Q-2014 spec, Annex L, hiCredit and
+    # loCredit for SR class A are calculated following the
+    # equations L-10 and L-12, respectively.
+    hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed)
+    locredit = math.ceil(sendslope * args.frame_a / args.link_speed)
+
+    print("Class A --> # tc qdisc replace dev IFACE parent ID cbs " \
+          "locredit %d hicredit %d sendslope %d idleslope %d" % \
+          (locredit, hicredit, sendslope, idleslope))
+
+def print_cbs_params_for_class_b(args):
+    idleslope = args.idleslope_b
+    sendslope = idleslope - args.link_speed
+
+    # Annex L doesn't present a straightforward equation to
+    # calculate hiCredit for Class B so we have to derive it
+    # based on generic equations presented in that Annex.
+    #
+    # L-3 is the primary equation to calculate hiCredit. Section
+    # L.2 states that the 'maxInterferenceSize' for SR class B
+    # is the maximum burst size for SR class A plus the
+    # maxInterferenceSize from SR class A (which is equal to the
+    # maximum frame from non-SR traffic).
+    #
+    # The maximum burst size for SR class A equation is shown in
+    # L-16. Merging L-16 into L-3 we get the resulting equation
+    # which calculates hiCredit B (refer to section L.3 in case
+    # you're not familiar with the legend):
+    #
+    # hiCredit B = Rb * (     Mo         Ma   )
+    #                     ---------- + ------
+    #                      Ro - Ra       Ro
+    #
+    hicredit = math.ceil(idleslope * \
+               ((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \
+               (args.frame_a / args.link_speed)))
+
+    # loCredit B is calculated following equation L-2.
+    locredit = math.ceil(sendslope * args.frame_b / args.link_speed)
+
+    print("Class B --> # tc qdisc replace dev IFACE parent ID cbs " \
+          "locredit %d hicredit %d sendslope %d idleslope %d" % \
+          (locredit, hicredit, sendslope, idleslope))
+
+def main():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float,
+                        help='Link speed in kbps (default=1000000)')
+    parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float,
+                        help='Maximum frame size from non-SR traffic (MTU size'
+                        ' usually, default=1500)')
+    parser.add_argument('-A', dest='idleslope_a', default=0, type=float,
+                        help='Idleslope for SR class A in kbps')
+    parser.add_argument('-a', dest='frame_a', default=0, type=float,
+                        help='Maximum frame size for SR class A traffic')
+    parser.add_argument('-B', dest='idleslope_b', default=0, type=float,
+                        help='Idleslope for SR class B in kbps')
+    parser.add_argument('-b', dest='frame_b', default=0, type=float,
+                        help='Maximum frame size for SR class B traffic')
+
+    args = parser.parse_args()
+
+    if not len(sys.argv) > 1:
+        parser.print_help()
+    else:
+        print("\nConfiguration lines to be used are:")
+
+    if args.idleslope_a > 0:
+        print_cbs_params_for_class_a(args)
+
+    if args.idleslope_b > 0:
+        print_cbs_params_for_class_b(args)
+
+
+if __name__ == "__main__":
+    main()
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (4 preceding siblings ...)
  2017-09-01  1:26 ` [RFC net-next 5/5] samples/tsn: Add script for calculating CBS config Vinicius Costa Gomes
@ 2017-09-01 13:03 ` Richard Cochran
  2017-09-01 16:12   ` Jesus Sanchez-Palencia
  2017-09-07  5:34 ` Henrik Austad
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-01 13:03 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong


I happy to see this posted.  At first glance, it seems like a step in
the right direction.

On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>  * Time-aware shaper (802.1Qbv):
...
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.

The i210 doesn't support this in HW, or does it?

>  * Frame Preemption (802.1Qbu):
> 
>    To control even further the latency, it may prove useful to signal which
>    traffic classes are marked as preemptable. For that, 'taprio' provides the
>    preemption command so you set each traffic class as preemptable or not:
> 
>    $ tc qdisc (...) \
>         preemption 0 1 1 1

Neither can the i210 preempt frames, or what am I missing?

The timing of this RFC is good, as I am just finishing up an RFC that
implements time-based transmit using the i210.  I'll try and get that
out ASAP.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01 13:03 ` [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Richard Cochran
@ 2017-09-01 16:12   ` Jesus Sanchez-Palencia
  2017-09-01 16:53     ` Richard Cochran
  2017-09-05  7:20     ` Richard Cochran
  0 siblings, 2 replies; 45+ messages in thread
From: Jesus Sanchez-Palencia @ 2017-09-01 16:12 UTC (permalink / raw)
  To: Richard Cochran, Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, boon.leong.ong

Hi Richard,


On 09/01/2017 06:03 AM, Richard Cochran wrote:
> 
> I happy to see this posted.  At first glance, it seems like a step in
> the right direction.
> 
> On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>>  * Time-aware shaper (802.1Qbv):
> ...
>>    S 0x01 300
>>    S 0x03 500
>>
>>    This means that there are two intervals, the first will have the gate
>>    for traffic class 0 open for 300 nanoseconds, the second will have
>>    both traffic classes open for 500 nanoseconds.
> 
> The i210 doesn't support this in HW, or does it?


No, it does not. i210 only provides support for a per-packet feature called
LaunchTime that can be used control both the fetch and the transmission time of
packets.


> 
>>  * Frame Preemption (802.1Qbu):
>>
>>    To control even further the latency, it may prove useful to signal which
>>    traffic classes are marked as preemptable. For that, 'taprio' provides the
>>    preemption command so you set each traffic class as preemptable or not:
>>
>>    $ tc qdisc (...) \
>>         preemption 0 1 1 1
> 
> Neither can the i210 preempt frames, or what am I missing?

No, it does not.

But when we started working on the shapers we decided to look ahead and try to
come up with interfaces that could cover beyond 802.1Qav. These are just some
ideas we've been prototyping here together with the 'cbs' qdisc.


> 
> The timing of this RFC is good, as I am just finishing up an RFC that
> implements time-based transmit using the i210.  I'll try and get that
> out ASAP.


Is it correct to assume you are referring to an interface for Launchtime here?


Thanks,
Jesus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01 16:12   ` Jesus Sanchez-Palencia
@ 2017-09-01 16:53     ` Richard Cochran
  2017-09-05  7:20     ` Richard Cochran
  1 sibling, 0 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-01 16:53 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong

On Fri, Sep 01, 2017 at 09:12:17AM -0700, Jesus Sanchez-Palencia wrote:
> Is it correct to assume you are referring to an interface for Launchtime here?

Yes.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01 16:12   ` Jesus Sanchez-Palencia
  2017-09-01 16:53     ` Richard Cochran
@ 2017-09-05  7:20     ` Richard Cochran
  1 sibling, 0 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-05  7:20 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong

On Fri, Sep 01, 2017 at 09:12:17AM -0700, Jesus Sanchez-Palencia wrote:
> On 09/01/2017 06:03 AM, Richard Cochran wrote:
> > The timing of this RFC is good, as I am just finishing up an RFC that
> > implements time-based transmit using the i210.  I'll try and get that
> > out ASAP.

I have an RFC series ready for net-next, but the the merge window just
started.  I'll post it when the window closes again...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (5 preceding siblings ...)
  2017-09-01 13:03 ` [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Richard Cochran
@ 2017-09-07  5:34 ` Henrik Austad
  2017-09-07 12:40   ` Richard Cochran
                     ` (2 more replies)
  2017-09-18  8:02 ` Richard Cochran
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 45+ messages in thread
From: Henrik Austad @ 2017-09-07  5:34 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

[-- Attachment #1: Type: text/plain, Size: 12931 bytes --]

On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
> Hi,
> 
> This patchset is an RFC on a proposal of how the Traffic Control subsystem can
> be used to offload the configuration of traffic shapers into network devices
> that provide support for them in HW. Our goal here is to start upstreaming
> support for features related to the Time-Sensitive Networking (TSN) set of
> standards into the kernel.

Nice to see that others are working on this as well! :)

A short disclaimer; I'm pretty much anchored in the view "linux is the 
end-station in a TSN domain", is this your approach as well, or are you 
looking at this driver to be used in bridges as well? (because that will 
affect the comments on time-aware shaper and frame preemption)

Yet another disclaimer; I am not a linux networking subsystem expert. Not 
by a long shot! There are black magic happening in the internals of the 
networking subsystem that I am not even aware of. So if something I say or 
ask does not make sense _at_all_, that's probably why..

I do know a tiny bit about TSN though, and I have been messing around 
with it for a little while, hence my comments below

> As part of this work, we've assessed previous public discussions related to TSN
> enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
> at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
> the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

/me eyes Cc ;p

> Overview
> ========
> 
> Time-sensitive Networking (TSN) is a set of standards that aim to address
> resources availability for providing bandwidth reservation and bounded latency
> on Ethernet based LANs. The proposal described here aims to cover mainly what is
> needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
> 802.1Qbu.
> 
> The initial target of this work is the Intel i210 NIC, but other controllers'
> datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
> the Synopsis DesignWare Ethernet QoS controller.

NXP has a TSN aware chip on the i.MX7 sabre board as well </fyi>

> Proposal
> ========
> 
> Feature-wise, what is covered here are configuration interfaces for HW
> implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
> (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
> Qbv and Qbu must be configured per port, with the configuration covering all
> queues. Given that these features are related to traffic shaping, and that the
> traffic control subsystem already provides a queueing discipline that offloads
> config into the device driver (i.e. mqprio), designing new qdiscs for the
> specific purpose of offloading the config for each shaper seemed like a good
> fit.

just to be clear, you register sch_cbs as a subclass to mqprio, not as a 
root class?

> For steering traffic into the correct queues, we use the socket option
> SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues.
> The qdisc mqprio is currently used in our tests.

Right, fair enough, I'd prefer the TSN qdisc to be the root-device and 
rather have mqprio for high priority traffic and another for 'everything 
else'', but this would work too. This is not that relevant at this stage I 
guess :)

> As for the shapers config interface:
> 
>  * CBS (802.1Qav)
> 
>    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
>    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
>      idleslope I

So this confuses me a bit, why specify sendSlope?

    sendSlope = portTransmitRate - idleSlope

and portTransmitRate is the speed of the MAC (which you get from the 
driver). Adding sendSlope here is just redundant I think.

Also, does this mean that when you create the qdisc, you have locked the 
bandwidth for the scheduler? Meaning, if I later want to add another 
stream that requires more bandwidth, I have to close all active streams, 
reconfigure the qdisc and then restart?

>    Note that the parameters for this qdisc are the ones defined by the
>    802.1Q-2014 spec, so no hardware specific functionality is exposed here.

You do need to know if the link is brought up as 100 or 1000 though - which 
the driver already knows.

>  * Time-aware shaper (802.1Qbv):
> 
>    The idea we are currently exploring is to add a "time-aware", priority based
>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:

As far as I know, this is not supported by i210, and if time-aware shaping 
is enabled in the network - you'll be queued on a bridge until the window 
opens as time-aware shaping is enforced on the tx-port and not on rx. Is 
this required in this driver?

>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> 	   queues 0 1 2 3                                              \
>      	   sched-file gates.sched [base-time <interval>]               \
>            [cycle-time <interval>] [extension-time <interval>]

That was a lot of priorities! 802.1Q lists 8 priorities, where does these 
16 come from?

You map pri 0,1 to queue 2, pri 2 to queue 1 (Class B), pri 3 to queue 0 
(class A) and everythign else to queue 3. This is what I would expect, 
except for the additional 8 priorities.

>    <file> is multi-line, with each line being of the following format:
>    <cmd> <gate mask> <interval in nanoseconds>
> 
>    Qbv only defines one <cmd>: "S" for 'SetGates'
> 
>    For example:
> 
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.

Are you aware of any hw except dedicated switching stuff that supports 
this? (meant as "I'm curious and would like to know")

>    Additionally, an option to set just one entry of the gate control list will
>    also be provided by 'taprio':
> 
>    $ tc qdisc (...) \
>         sched-row <row number> <cmd> <gate mask> <interval>  \
>         [base-time <interval>] [cycle-time <interval>] \
>         [extension-time <interval>]
> 
> 
>  * Frame Preemption (802.1Qbu):

So Frame preemption is nice, but my understanding of Qbu is that the real 
benefit is at the bridges and not in the endpoints. As jumbo-frames is 
explicitly disallowed in Qav, the maximum latency incurred by a frame in 
flight is 12us on a 1Gbps link. I am not sure if these 12us is what will be 
the main delay in your application.

Or have I missed some crucial point here?

>    To control even further the latency, it may prove useful to signal which
>    traffic classes are marked as preemptable. For that, 'taprio' provides the
>    preemption command so you set each traffic class as preemptable or not:
> 
>    $ tc qdisc (...) \
>         preemption 0 1 1 1
> 
>  * Time-aware shaper + Preemption:
> 
>    As an example of how Qbv and Qbu can be used together, we may specify
>    both the schedule and the preempt-mask, and this way we may also
>    specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as
>    specified in the Qbu spec:
> 
>    $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                    \
> 	   queues 0 1 2 3                                         \
>      	   preemption 0 1 1 1                                     \
> 	   sched-file preempt_gates.sched
> 
>     <file> is multi-line, with each line being of the following format:
>     <cmd> <gate mask> <interval in nanoseconds>
> 
>     For this case, two new commands are introduced:
> 
>     "H" for 'set gates and hold'
>     "R" for 'set gates and release'
> 
>     H 0x01 300
>     R 0x03 500

So my understanding of all of this is that you configure the *total* 
bandwith for each class when you load the qdisc and then let userspace 
handle the rest. Is this correct?

In my view, it would be nice if the qdisc had some notion about streams so 
that you could create a stream, feed frames to it and let the driver pace 
them out. (The fewer you queue, the shorter the delay). This will also 
allow you to enforce per-stream bandwidth restrictions. I don't see how you 
can do this here unless you want to do this in userspace.

Do you have any plans for adding support for multiplexing streams? If you 
have multiple streams, how do you enforce that one stream does not eat into 
the bandwidth of another stream? AFAIK, this is something the network must 
enforce, but I see no option of doing som here.

> Testing this RFC
> ================
> 
> For testing the patches of this RFC only, you can refer to the samples and
> helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to
> setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to
> configure the HW shaper of the i210 controller:

I will test it, feedback will be provided soon! :)

Thanks!
-Henrik

> 1) Setup priorities to traffic classes to hardware queues mapping
> $ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \
>      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> 2) Check scheme. You want to get the inner qdiscs ID from the bottom up
> $ tc -g  class show dev enp3s0
> 
> Ex.:
> +---(802a:3) mqprio
> |    +---(802a:6) mqprio
> |    +---(802a:7) mqprio
> |
> +---(802a:2) mqprio
> |    +---(802a:5) mqprio
> |
> +---(802a:1) mqprio
>      +---(802a:4) mqprio
> 
>  * Here '802a:4' is Tx Queue #0 and '802a:5' is Tx Queue #1.
> 
> 3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
>    for B is 10Mbps:
> $ ./samples/tsn/calculate_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500
> 
> 4) Configure CBS for traffic class A (priority 3) as provided by the script:
> $ tc qdisc replace dev enp3s0 parent 802a:4 cbs locredit -1470 \
>      hicredit 30 sendslope -980000 idleslope 20000
> 
> 5) Configure CBS for traffic class B (priority 2):
> $ tc qdisc replace dev enp3s0 parent 802a:5 cbs \
>      locredit -1485 hicredit 31 sendslope -990000 idleslope 10000
> 
> 6) Run Listener, compiled from samples/tsn/listener.c
> $ ./listener -i enp3s0
> 
> 7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
> $ ./talker -i enp3s0 -p 3
> 
>  * The bandwidth displayed on the listener output at this stage should be very
>    close to the one configured for class A.
> 
> 8) You can also run a Talker for class B (prio 2 here)
> $ ./talker -i enp3s0 -p 2
> 
>  * The bandwidth displayed on the listener output now should increase to very
>    close to the one configured for class A + class B.

Because you grab both class A *and* B, or because B will eat what A does 
not use?

-H

> Authors
> =======
>  - Andre Guedes <andre.guedes@intel.com>
>  - Ivan Briano <ivan.briano@intel.com>
>  - Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>  - Vinicius Gomes <vinicius.gomes@intel.com>
> 
> 
> Andre Guedes (2):
>   igb: Add support for CBS offload
>   samples/tsn: Add script for calculating CBS config
> 
> Jesus Sanchez-Palencia (1):
>   sample: Add TSN Talker and Listener examples
> 
> Vinicius Costa Gomes (2):
>   net/sched: Introduce the user API for the CBS shaper
>   net/sched: Introduce Credit Based Shaper (CBS) qdisc
> 
>  drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
>  drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
>  drivers/net/ethernet/intel/igb/igb.h           |   6 +
>  drivers/net/ethernet/intel/igb/igb_main.c      | 349 +++++++++++++++++++++++++
>  include/linux/netdevice.h                      |   1 +
>  include/uapi/linux/pkt_sched.h                 |  29 ++
>  net/sched/Kconfig                              |  11 +
>  net/sched/Makefile                             |   1 +
>  net/sched/sch_cbs.c                            | 286 ++++++++++++++++++++
>  samples/tsn/calculate_cbs_params.py            | 112 ++++++++
>  samples/tsn/listener.c                         | 254 ++++++++++++++++++
>  samples/tsn/talker.c                           | 136 ++++++++++
>  12 files changed, 1216 insertions(+)
>  create mode 100644 net/sched/sch_cbs.c
>  create mode 100755 samples/tsn/calculate_cbs_params.py
>  create mode 100644 samples/tsn/listener.c
>  create mode 100644 samples/tsn/talker.c
> 
> --
> 2.14.1

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07  5:34 ` Henrik Austad
@ 2017-09-07 12:40   ` Richard Cochran
  2017-09-07 15:27     ` Henrik Austad
  2017-09-07 19:58   ` Guedes, Andre
  2017-09-08  1:29   ` Vinicius Costa Gomes
  2 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-07 12:40 UTC (permalink / raw)
  To: Henrik Austad
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

On Thu, Sep 07, 2017 at 07:34:11AM +0200, Henrik Austad wrote:
> Also, does this mean that when you create the qdisc, you have locked the 
> bandwidth for the scheduler? Meaning, if I later want to add another 
> stream that requires more bandwidth, I have to close all active streams, 
> reconfigure the qdisc and then restart?

No, just allocate enough bandwidth to accomodate all of the expected
streams.  The streams can start and stop at will.

> So my understanding of all of this is that you configure the *total* 
> bandwith for each class when you load the qdisc and then let userspace 
> handle the rest. Is this correct?

Nothing wrong with that.
 
> In my view, it would be nice if the qdisc had some notion about streams so 
> that you could create a stream, feed frames to it and let the driver pace 
> them out. (The fewer you queue, the shorter the delay). This will also 
> allow you to enforce per-stream bandwidth restrictions. I don't see how you 
> can do this here unless you want to do this in userspace.
> 
> Do you have any plans for adding support for multiplexing streams? If you 
> have multiple streams, how do you enforce that one stream does not eat into 
> the bandwidth of another stream? AFAIK, this is something the network must 
> enforce, but I see no option of doing som here.

Please, lets keep this simple.  Today we have exactly zero user space
applications using this kind of bandwidth reservation.  The case of
wanting the kernel to police individual stream usage does not exist,
and probably never will.

For serious TSN use cases, the bandwidth needed by each system and
indeed the entire network will be engineered, and we can reasonably
expect applications to cooperate in this regard.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07 12:40   ` Richard Cochran
@ 2017-09-07 15:27     ` Henrik Austad
  2017-09-07 15:53       ` Richard Cochran
  0 siblings, 1 reply; 45+ messages in thread
From: Henrik Austad @ 2017-09-07 15:27 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote:
> On Thu, Sep 07, 2017 at 07:34:11AM +0200, Henrik Austad wrote:
> > Also, does this mean that when you create the qdisc, you have locked the 
> > bandwidth for the scheduler? Meaning, if I later want to add another 
> > stream that requires more bandwidth, I have to close all active streams, 
> > reconfigure the qdisc and then restart?
> 
> No, just allocate enough bandwidth to accomodate all of the expected
> streams.  The streams can start and stop at will.

Sure, that'll work.

And if you want to this driver to act as a bridge, how do you accomodate 
change in network requirements? (i.e. how does this work with switchdev?)
- Or am I overthinking this?

> > So my understanding of all of this is that you configure the *total* 
> > bandwith for each class when you load the qdisc and then let userspace 
> > handle the rest. Is this correct?
> 
> Nothing wrong with that.

Didn't mean to say it was wrong, just making sure I've understood the 
concept.

> > In my view, it would be nice if the qdisc had some notion about streams so 
> > that you could create a stream, feed frames to it and let the driver pace 
> > them out. (The fewer you queue, the shorter the delay). This will also 
> > allow you to enforce per-stream bandwidth restrictions. I don't see how you 
> > can do this here unless you want to do this in userspace.
> > 
> > Do you have any plans for adding support for multiplexing streams? If you 
> > have multiple streams, how do you enforce that one stream does not eat into 
> > the bandwidth of another stream? AFAIK, this is something the network must 
> > enforce, but I see no option of doing som here.
> 
> Please, lets keep this simple. 

Simple is always good

> Today we have exactly zero user space
> applications using this kind of bandwidth reservation.  The case of
> wanting the kernel to police individual stream usage does not exist,
> and probably never will.

That we have *zero* userspace applications today is probably related to the 
fact that we have exacatly *zero* drivers in the kernel that talks TSN :)

To rephrase a bit, what I'm worried about:

If you have more than 1 application in userspace that wants to send data 
using this scheduler, how do you ensure fair transmission of frames? (both 
how much bandwidth they use, but also ordering of frames from each 
application) Do you expect all of this to be handled in userspace?

> For serious TSN use cases, the bandwidth needed by each system and
> indeed the entire network will be engineered, and we can reasonably
> expect applications to cooperate in this regard.

yes.. that'll happen ;)

> Thanks,
> Richard

Don't get me wrong, I think it is great that others are working on this!
I'm just trying to fully understand the thought that have gone into this 
and how it is inteded to be used.

I'll get busy testing the code and wrapping my head around the different 
parameters.

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07 15:27     ` Henrik Austad
@ 2017-09-07 15:53       ` Richard Cochran
  2017-09-07 16:18         ` Henrik Austad
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-07 15:53 UTC (permalink / raw)
  To: Henrik Austad
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote:
> On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote:
> And if you want to this driver to act as a bridge, how do you accomodate 
> change in network requirements? (i.e. how does this work with switchdev?)

To my understanding, this Qdisc idea provides QoS for the host's
transmitted traffic, and nothing more.

> - Or am I overthinking this?

Being able to configure the external ports of a switchdev is probably
a nice feature, but that is another story.  (But maybe I misunderstood
the authors' intent!)

> If you have more than 1 application in userspace that wants to send data 
> using this scheduler, how do you ensure fair transmission of frames? (both 
> how much bandwidth they use,

There are many ways to handle this, and we shouldn't put any of that
policy into the kernel.  For example, there might be a monolithic
application with configurable threads, or an allocation server that
grants bandwidth to applications via IPC, or a multiplexing stream
server like jack, pulse, etc, and so on...

> but also ordering of frames from each application)

Not sure what you mean by this.

> Do you expect all of this to be handled in userspace?

Yes, I do.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07 15:53       ` Richard Cochran
@ 2017-09-07 16:18         ` Henrik Austad
  2017-09-07 21:51           ` Guedes, Andre
  0 siblings, 1 reply; 45+ messages in thread
From: Henrik Austad @ 2017-09-07 16:18 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

[-- Attachment #1: Type: text/plain, Size: 2300 bytes --]

On Thu, Sep 07, 2017 at 05:53:15PM +0200, Richard Cochran wrote:
> On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote:
> > On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote:
> > And if you want to this driver to act as a bridge, how do you accomodate 
> > change in network requirements? (i.e. how does this work with switchdev?)
> 
> To my understanding, this Qdisc idea provides QoS for the host's
> transmitted traffic, and nothing more.

Ok, then we're on the same page.

> > - Or am I overthinking this?
> 
> Being able to configure the external ports of a switchdev is probably
> a nice feature, but that is another story.  (But maybe I misunderstood
> the authors' intent!)

ok, chalk that one up for later perhaps

> > If you have more than 1 application in userspace that wants to send data 
> > using this scheduler, how do you ensure fair transmission of frames? (both 
> > how much bandwidth they use,
> 
> There are many ways to handle this, and we shouldn't put any of that
> policy into the kernel.  For example, there might be a monolithic
> application with configurable threads, or an allocation server that
> grants bandwidth to applications via IPC, or a multiplexing stream
> server like jack, pulse, etc, and so on...

true

> > but also ordering of frames from each application)
> 
> Not sure what you mean by this.

Fair enough, I'm not that good at making myself clear :)

Let's see if I can make a better attempt:

If you have 2 separate applications that have their own streams going to 
different endpoints - but both are in the same class, then they will 
share the qdisc bandwidth.

So application 
- A sends frame A1, A2, A3, .. An
- B sends B1, B2, .. Bn

What I was trying to describe was: if application A send 2 frames, and B 
sends 2 frames at the same time, then you would hope that the order would 
be A1, B1, A2, B2, and not A1, A2, B1, B2.

None of this would be a problem if you expect a *single* user, like the 
allocation server you described above. Again, I think this is just me 
overthinking the problem right now :)

> > Do you expect all of this to be handled in userspace?
> 
> Yes, I do.

ok, fair enough

Thanks for answering my questions!

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07  5:34 ` Henrik Austad
  2017-09-07 12:40   ` Richard Cochran
@ 2017-09-07 19:58   ` Guedes, Andre
  2017-09-08  6:06     ` Henrik Austad
  2017-09-08  1:29   ` Vinicius Costa Gomes
  2 siblings, 1 reply; 45+ messages in thread
From: Guedes, Andre @ 2017-09-07 19:58 UTC (permalink / raw)
  To: Gomes, Vinicius, henrik
  Cc: jiri, jhs, Ong, Boon Leong, xiyou.wangcong, Sanchez-Palencia,
	Jesus, richardcochran, netdev, Briano, Ivan, intel-wired-lan

[-- Attachment #1: Type: text/plain, Size: 3398 bytes --]

Hi Henrik,

Thanks for your feedback! I'll address some of your comments below.

On Thu, 2017-09-07 at 07:34 +0200, Henrik Austad wrote:
> > As for the shapers config interface:
> > 
> >  * CBS (802.1Qav)
> > 
> >    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line
> > is:
> >    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S
> > \
> >      idleslope I
> 
> So this confuses me a bit, why specify sendSlope?
> 
>     sendSlope = portTransmitRate - idleSlope
> 
> and portTransmitRate is the speed of the MAC (which you get from the 
> driver). Adding sendSlope here is just redundant I think.

Yes, this was something we've spent quite a few time discussing before this RFC
series. After reading the Annex L from 802.1Q-2014 (operation of CBS algorithm)
so many times, we've came up with the rationale explained below.

The rationale here is that sendSlope is just another parameter from CBS
algorithm like idleSlope, hiCredit and loCredit. As such, its calculation
should be done at the same "layer" as the others parameters (in this case, user
space) in order to keep consistency. Moreover, in this design, the driver layer
is dead simple: all the device driver has to do is applying CBS parameters to
hardware. Having any CBS parameter calculation in the driver layer means all
device drivers must implement that calculation.

> Also, does this mean that when you create the qdisc, you have locked the 
> bandwidth for the scheduler? Meaning, if I later want to add another 
> stream that requires more bandwidth, I have to close all active streams, 
> reconfigure the qdisc and then restart?

If we want to reserve more bandwidth to "accommodate" a new stream, we don't
need to close all active streams. All we have to do is changing the CBS qdisc
and pass the new CBS parameters. Here is what the command-line would look like:

$ tc qdisc change dev enp0s4 parent 8001:5 cbs locredit -1470 hicredit 30
sendslope -980000 idleslope 20000

No application/stream is interrupted while new CBS parameters are applied.

> >    Note that the parameters for this qdisc are the ones defined by the
> >    802.1Q-2014 spec, so no hardware specific functionality is exposed here.
> 
> You do need to know if the link is brought up as 100 or 1000 though - which 
> the driver already knows.

User space knows that information via ethtool or /sys.

> > Testing this RFC
> > ================
> > 
> > For testing the patches of this RFC only, you can refer to the samples and
> > helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to
> > setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to
> > configure the HW shaper of the i210 controller:
> 
> I will test it, feedback will be provided soon! :)

That's great! Please let us know if you find any issue and thanks for you
support.

> > 8) You can also run a Talker for class B (prio 2 here)
> > $ ./talker -i enp3s0 -p 2
> > 
> >  * The bandwidth displayed on the listener output now should increase to
> > very
> >    close to the one configured for class A + class B.
> 
> Because you grab both class A *and* B, or because B will eat what A does 
> not use?

Because the listener application grabs both class A and B traffic.

Regards,

Andre

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3262 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07 16:18         ` Henrik Austad
@ 2017-09-07 21:51           ` Guedes, Andre
  0 siblings, 0 replies; 45+ messages in thread
From: Guedes, Andre @ 2017-09-07 21:51 UTC (permalink / raw)
  To: richardcochran, henrik
  Cc: jiri, jhs, Ong, Boon Leong, xiyou.wangcong, Gomes, Vinicius,
	Sanchez-Palencia, Jesus, netdev, Briano, Ivan, intel-wired-lan

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

On Thu, 2017-09-07 at 18:18 +0200, Henrik Austad wrote:
> On Thu, Sep 07, 2017 at 05:53:15PM +0200, Richard Cochran wrote:
> > On Thu, Sep 07, 2017 at 05:27:51PM +0200, Henrik Austad wrote:
> > > On Thu, Sep 07, 2017 at 02:40:18PM +0200, Richard Cochran wrote:
> > > And if you want to this driver to act as a bridge, how do you accomodate 
> > > change in network requirements? (i.e. how does this work with switchdev?)
> > 
> > To my understanding, this Qdisc idea provides QoS for the host's
> > transmitted traffic, and nothing more.
> 
> Ok, then we're on the same page.
> 
> > > - Or am I overthinking this?
> > 
> > Being able to configure the external ports of a switchdev is probably
> > a nice feature, but that is another story.  (But maybe I misunderstood
> > the authors' intent!)
> 
> ok, chalk that one up for later perhaps

Just to clarify, we've been most focused on end-station use-cases. We've
considered some bridge use-cases as well just to verify that the proposed
design won't be an issue if someone else goes for it.

- Andre

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3262 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07  5:34 ` Henrik Austad
  2017-09-07 12:40   ` Richard Cochran
  2017-09-07 19:58   ` Guedes, Andre
@ 2017-09-08  1:29   ` Vinicius Costa Gomes
  2017-09-12  4:56     ` Richard Cochran
  2 siblings, 1 reply; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-08  1:29 UTC (permalink / raw)
  To: Henrik Austad
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

Henrik Austad <henrik@austad.us> writes:

> On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>> Hi,
>>
>> This patchset is an RFC on a proposal of how the Traffic Control subsystem can
>> be used to offload the configuration of traffic shapers into network devices
>> that provide support for them in HW. Our goal here is to start upstreaming
>> support for features related to the Time-Sensitive Networking (TSN) set of
>> standards into the kernel.
>
> Nice to see that others are working on this as well! :)
>
> A short disclaimer; I'm pretty much anchored in the view "linux is the
> end-station in a TSN domain", is this your approach as well, or are you
> looking at this driver to be used in bridges as well? (because that will
> affect the comments on time-aware shaper and frame preemption)
>
> Yet another disclaimer; I am not a linux networking subsystem expert. Not
> by a long shot! There are black magic happening in the internals of the
> networking subsystem that I am not even aware of. So if something I say or
> ask does not make sense _at_all_, that's probably why..
>
> I do know a tiny bit about TSN though, and I have been messing around
> with it for a little while, hence my comments below
>
>> As part of this work, we've assessed previous public discussions related to TSN
>> enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
>> at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
>> the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
>
> /me eyes Cc ;p
>
>> Overview
>> ========
>>
>> Time-sensitive Networking (TSN) is a set of standards that aim to address
>> resources availability for providing bandwidth reservation and bounded latency
>> on Ethernet based LANs. The proposal described here aims to cover mainly what is
>> needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
>> 802.1Qbu.
>>
>> The initial target of this work is the Intel i210 NIC, but other controllers'
>> datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
>> the Synopsis DesignWare Ethernet QoS controller.
>
> NXP has a TSN aware chip on the i.MX7 sabre board as well </fyi>

Cool. Will take a look.

>
>> Proposal
>> ========
>>
>> Feature-wise, what is covered here are configuration interfaces for HW
>> implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
>> (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
>> Qbv and Qbu must be configured per port, with the configuration covering all
>> queues. Given that these features are related to traffic shaping, and that the
>> traffic control subsystem already provides a queueing discipline that offloads
>> config into the device driver (i.e. mqprio), designing new qdiscs for the
>> specific purpose of offloading the config for each shaper seemed like a good
>> fit.
>
> just to be clear, you register sch_cbs as a subclass to mqprio, not as a
> root class?

That's right.

>
>> For steering traffic into the correct queues, we use the socket option
>> SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues.
>> The qdisc mqprio is currently used in our tests.
>
> Right, fair enough, I'd prefer the TSN qdisc to be the root-device and
> rather have mqprio for high priority traffic and another for 'everything
> else'', but this would work too. This is not that relevant at this stage I
> guess :)

That's a scenario I haven't considered, will give it some thought.

>
>> As for the shapers config interface:
>>
>>  * CBS (802.1Qav)
>>
>>    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
>>    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
>>      idleslope I
>
> So this confuses me a bit, why specify sendSlope?
>
>     sendSlope = portTransmitRate - idleSlope
>
> and portTransmitRate is the speed of the MAC (which you get from the
> driver). Adding sendSlope here is just redundant I think.
>
> Also, does this mean that when you create the qdisc, you have locked the
> bandwidth for the scheduler? Meaning, if I later want to add another
> stream that requires more bandwidth, I have to close all active streams,
> reconfigure the qdisc and then restart?
>
>>    Note that the parameters for this qdisc are the ones defined by the
>>    802.1Q-2014 spec, so no hardware specific functionality is exposed here.
>
> You do need to know if the link is brought up as 100 or 1000 though - which
> the driver already knows.
>
>>  * Time-aware shaper (802.1Qbv):
>>
>>    The idea we are currently exploring is to add a "time-aware", priority based
>>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
>
> As far as I know, this is not supported by i210, and if time-aware shaping
> is enabled in the network - you'll be queued on a bridge until the window
> opens as time-aware shaping is enforced on the tx-port and not on rx. Is
> this required in this driver?

Yeah, i210 doesn't support the time-aware shaper. I think the second
part of your question doesn't really apply, then.

>
>>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
>> 	   queues 0 1 2 3                                              \
>>      	   sched-file gates.sched [base-time <interval>]               \
>>            [cycle-time <interval>] [extension-time <interval>]
>
> That was a lot of priorities! 802.1Q lists 8 priorities, where does these
> 16 come from?

Even if the 802.1Q only defines 8 priorities, the Linux network stack
supports a lot more (and this command line is more than slightly
inspired by the mqprio equivalent).

>
> You map pri 0,1 to queue 2, pri 2 to queue 1 (Class B), pri 3 to queue 0
> (class A) and everythign else to queue 3. This is what I would expect,
> except for the additional 8 priorities.
>
>>    <file> is multi-line, with each line being of the following format:
>>    <cmd> <gate mask> <interval in nanoseconds>
>>
>>    Qbv only defines one <cmd>: "S" for 'SetGates'
>>
>>    For example:
>>
>>    S 0x01 300
>>    S 0x03 500
>>
>>    This means that there are two intervals, the first will have the gate
>>    for traffic class 0 open for 300 nanoseconds, the second will have
>>    both traffic classes open for 500 nanoseconds.
>
> Are you aware of any hw except dedicated switching stuff that supports
> this? (meant as "I'm curious and would like to know")

Not really. I couldn't find any public documentation about products
destined for end stations that support this. I, too, would like to know
more.

>
>>    Additionally, an option to set just one entry of the gate control list will
>>    also be provided by 'taprio':
>>
>>    $ tc qdisc (...) \
>>         sched-row <row number> <cmd> <gate mask> <interval>  \
>>         [base-time <interval>] [cycle-time <interval>] \
>>         [extension-time <interval>]
>>
>>
>>  * Frame Preemption (802.1Qbu):
>
> So Frame preemption is nice, but my understanding of Qbu is that the real
> benefit is at the bridges and not in the endpoints. As jumbo-frames is
> explicitly disallowed in Qav, the maximum latency incurred by a frame in
> flight is 12us on a 1Gbps link. I am not sure if these 12us is what will be
> the main delay in your application.
>
> Or have I missed some crucial point here?


You didn't seem to have missed anything. What I saw as the biggest point
for frame preemption, is when it is used with scheduled traffic, you
could keep the preemptable traffic classes gates always open, have a few
time windows for periodic traffic, and still have predictable behaviour
for an unscheduled "emergency" traffic.


Cheers,

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-07 19:58   ` Guedes, Andre
@ 2017-09-08  6:06     ` Henrik Austad
  0 siblings, 0 replies; 45+ messages in thread
From: Henrik Austad @ 2017-09-08  6:06 UTC (permalink / raw)
  To: Guedes, Andre
  Cc: Gomes, Vinicius, jiri, jhs, Ong, Boon Leong, xiyou.wangcong,
	Sanchez-Palencia, Jesus, richardcochran, netdev, Briano, Ivan,
	intel-wired-lan

[-- Attachment #1: Type: text/plain, Size: 3863 bytes --]

On Thu, Sep 07, 2017 at 07:58:53PM +0000, Guedes, Andre wrote:
> Hi Henrik,
> 
> Thanks for your feedback! I'll address some of your comments below.
> 
> On Thu, 2017-09-07 at 07:34 +0200, Henrik Austad wrote:
> > > As for the shapers config interface:
> > > 
> > >  * CBS (802.1Qav)
> > > 
> > >    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line
> > > is:
> > >    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S
> > > \
> > >      idleslope I
> > 
> > So this confuses me a bit, why specify sendSlope?
> > 
> >     sendSlope = portTransmitRate - idleSlope
> > 
> > and portTransmitRate is the speed of the MAC (which you get from the 
> > driver). Adding sendSlope here is just redundant I think.
> 
> Yes, this was something we've spent quite a few time discussing before this RFC
> series. After reading the Annex L from 802.1Q-2014 (operation of CBS algorithm)
> so many times, we've came up with the rationale explained below.
> 
> The rationale here is that sendSlope is just another parameter from CBS
> algorithm like idleSlope, hiCredit and loCredit. As such, its calculation
> should be done at the same "layer" as the others parameters (in this case, user
> space) in order to keep consistency. Moreover, in this design, the driver layer
> is dead simple: all the device driver has to do is applying CBS parameters to
> hardware. Having any CBS parameter calculation in the driver layer means all
> device drivers must implement that calculation.

Ok, that actually makes a lot of sense, and anything that keeps this kind 
of arithmetic outside the kernel is a good thing!

Thanks for the clarification!

> > Also, does this mean that when you create the qdisc, you have locked the 
> > bandwidth for the scheduler? Meaning, if I later want to add another 
> > stream that requires more bandwidth, I have to close all active streams, 
> > reconfigure the qdisc and then restart?
> 
> If we want to reserve more bandwidth to "accommodate" a new stream, we don't
> need to close all active streams. All we have to do is changing the CBS qdisc
> and pass the new CBS parameters. Here is what the command-line would look like:
> 
> $ tc qdisc change dev enp0s4 parent 8001:5 cbs locredit -1470 hicredit 30
> sendslope -980000 idleslope 20000
> 
> No application/stream is interrupted while new CBS parameters are applied.

Ah, good.

> > >    Note that the parameters for this qdisc are the ones defined by the
> > >    802.1Q-2014 spec, so no hardware specific functionality is exposed here.
> > 
> > You do need to know if the link is brought up as 100 or 1000 though - which 
> > the driver already knows.
> 
> User space knows that information via ethtool or /sys.

Fair point.

> > > Testing this RFC
> > > ================
> > > 
> > > For testing the patches of this RFC only, you can refer to the samples and
> > > helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to
> > > setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to
> > > configure the HW shaper of the i210 controller:
> > 
> > I will test it, feedback will be provided soon! :)
> 
> That's great! Please let us know if you find any issue and thanks for you
> support.
> 
> > > 8) You can also run a Talker for class B (prio 2 here)
> > > $ ./talker -i enp3s0 -p 2
> > > 
> > >  * The bandwidth displayed on the listener output now should increase to
> > > very
> > >    close to the one configured for class A + class B.
> > 
> > Because you grab both class A *and* B, or because B will eat what A does 
> > not use?
> 
> Because the listener application grabs both class A and B traffic.

Right, got it.

Thanks for the feedback, I'm getting really excited about this! :D

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc
  2017-09-01  1:26 ` [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc Vinicius Costa Gomes
@ 2017-09-08 13:43   ` Henrik Austad
  2017-09-14  0:39     ` Vinicius Costa Gomes
  0 siblings, 1 reply; 45+ messages in thread
From: Henrik Austad @ 2017-09-08 13:43 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

[-- Attachment #1: Type: text/plain, Size: 11796 bytes --]

On Thu, Aug 31, 2017 at 06:26:22PM -0700, Vinicius Costa Gomes wrote:
> This queueing discipline implements the shaper algorithm defined by
> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
> 
> It's primary usage is to apply some bandwidth reservation to user
> defined traffic classes, which are mapped to different queues via the
> mqprio qdisc.
> 
> Initially, it only supports offloading the traffic shaping work to
> supporting controllers.
> 
> Later, when a software implementation is added, the current dependency
> on being installed "under" mqprio can be lifted.
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

So, I started testing this in my VM to make sure I didn't do anything silly 
and it exploded spectacularly as it used the underlying e1000 driver which 
does not have a ndo_setup_tc present.

I commented inline below, but as a tl;dr

The cbs_init() would call into cbs_change() that would correctly detect 
that ndo_setup_tc is missing and abort. However, at this stage it would try 
to desroy the qdisc and in cbs_destroy() there's no such guard leading to a

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010

Fixing that, another NULL would be found when the code walks into 
qdisc_destroy(q->qdisc).

Apart from this, it loaded fine after some wrestling with tc :)

Awesome! :D

> ---
>  include/linux/netdevice.h |   1 +
>  net/sched/Kconfig         |  11 ++
>  net/sched/Makefile        |   1 +
>  net/sched/sch_cbs.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 299 insertions(+)
>  create mode 100644 net/sched/sch_cbs.c
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 35de8312e0b5..dd9a2ecd0c03 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -775,6 +775,7 @@ enum tc_setup_type {
>  	TC_SETUP_CLSFLOWER,
>  	TC_SETUP_CLSMATCHALL,
>  	TC_SETUP_CLSBPF,
> +	TC_SETUP_CBS,
>  };
>  
>  /* These structures hold the attributes of xdp state that are being passed
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index e70ed26485a2..c03d86a7775e 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -172,6 +172,17 @@ config NET_SCH_TBF
>  	  To compile this code as a module, choose M here: the
>  	  module will be called sch_tbf.
>  
> +config NET_SCH_CBS

Shouldn't this depend on NET_SCH_MQPRIO as it is supposed to hook into 
that?

@@ -173,6 +173,7 @@ config NET_SCH_TBF
          module will be called sch_tbf.
 
 config NET_SCH_CBS
+       depends on NET_SCH_MQPRIO
        tristate "Credit Based Shaper (CBS)"
        ---help---
          Say Y here if you want to use the Credit Based Shaper (CBS) packet

> +	tristate "Credit Based Shaper (CBS)"
> +	---help---
> +	  Say Y here if you want to use the Credit Based Shaper (CBS) packet
> +	  scheduling algorithm.
> +
> +	  See the top of <file:net/sched/sch_cbs.c> for more details.
> +
> +	  To compile this code as a module, choose M here: the
> +	  module will be called sch_cbs.
> +
>  config NET_SCH_GRED
>  	tristate "Generic Random Early Detection (GRED)"
>  	---help---
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 7b915d226de7..80c8f92d162d 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
>  obj-$(CONFIG_NET_SCH_FQ)	+= sch_fq.o
>  obj-$(CONFIG_NET_SCH_HHF)	+= sch_hhf.o
>  obj-$(CONFIG_NET_SCH_PIE)	+= sch_pie.o
> +obj-$(CONFIG_NET_SCH_CBS)	+= sch_cbs.o
>  
>  obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
>  obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
> diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
> new file mode 100644
> index 000000000000..1c86a9e14150
> --- /dev/null
> +++ b/net/sched/sch_cbs.c
> @@ -0,0 +1,286 @@
> +/*
> + * net/sched/sch_cbs.c	Credit Based Shaper
> + *
> + *		This program is free software; you can redistribute it and/or
> + *		modify it under the terms of the GNU General Public License
> + *		as published by the Free Software Foundation; either version
> + *		2 of the License, or (at your option) any later version.
> + *
> + * Authors:	Vininicius Costa Gomes <vinicius.gomes@intel.com>
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <net/netlink.h>
> +#include <net/sch_generic.h>
> +#include <net/pkt_sched.h>
> +
> +struct cbs_sched_data {
> +	struct Qdisc *qdisc; /* Inner qdisc, default - pfifo queue */
> +	s32 queue;
> +	s32 locredit;
> +	s32 hicredit;
> +	s32 sendslope;
> +	s32 idleslope;
> +};
> +
> +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> +		       struct sk_buff **to_free)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +	int ret;
> +
> +	ret = qdisc_enqueue(skb, q->qdisc, to_free);
> +	if (ret != NET_XMIT_SUCCESS) {
> +		if (net_xmit_drop_count(ret))
> +			qdisc_qstats_drop(sch);
> +		return ret;
> +	}
> +
> +	qdisc_qstats_backlog_inc(sch, skb);
> +	sch->q.qlen++;
> +	return NET_XMIT_SUCCESS;
> +}
> +
> +static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +	struct sk_buff *skb;
> +
> +	skb = q->qdisc->ops->peek(q->qdisc);
> +	if (skb) {
> +		skb = qdisc_dequeue_peeked(q->qdisc);
> +		if (unlikely(!skb))
> +			return NULL;
> +
> +		qdisc_qstats_backlog_dec(sch, skb);
> +		sch->q.qlen--;
> +		qdisc_bstats_update(sch, skb);
> +
> +		return skb;
> +	}
> +	return NULL;
> +}
> +
> +static void cbs_reset(struct Qdisc *sch)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +
> +	qdisc_reset(q->qdisc);
> +}
> +
> +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
> +	[TCA_CBS_PARMS]	= { .len = sizeof(struct tc_cbs_qopt) },
> +};
> +
> +static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +	struct tc_cbs_qopt_offload cbs = { };
> +	struct nlattr *tb[TCA_CBS_MAX + 1];
> +	const struct net_device_ops *ops;
> +	struct tc_cbs_qopt *qopt;
> +	struct net_device *dev;
> +	int err;
> +
> +	err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
> +	if (err < 0)
> +		return err;
> +
> +	err = -EINVAL;
> +	if (!tb[TCA_CBS_PARMS])
> +		goto done;
> +
> +	qopt = nla_data(tb[TCA_CBS_PARMS]);
> +
> +	dev = qdisc_dev(sch);
> +	ops = dev->netdev_ops;
> +
> +	/* FIXME: this means that we can only install this qdisc
> +	 * "under" mqprio. Do we need a more generic way to retrieve
> +	 * the queue, or do we pass the netdev_queue to the driver?
> +	 */
> +	cbs.queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
> +
> +	cbs.enable = 1;
> +	cbs.hicredit = qopt->hicredit;
> +	cbs.locredit = qopt->locredit;
> +	cbs.idleslope = qopt->idleslope;
> +	cbs.sendslope = qopt->sendslope;
> +
> +	err = -ENOTSUPP;
> +	if (!ops->ndo_setup_tc)
> +		goto done;

This confuses tc a bit, and looking at other qdisc schedulers, it seems 
like EOPNOTSUPP is an alternative, this changes the output from

RTNETLINK answers: Unknown error 524
 - to
RTNETLINK answers: Operation not supported

which perhaps is more informative.

> +
> +	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
> +	if (err < 0)
> +		goto done;
> +
> +	q->queue = cbs.queue;
> +	q->hicredit = cbs.hicredit;
> +	q->locredit = cbs.locredit;
> +	q->idleslope = cbs.idleslope;
> +	q->sendslope = cbs.sendslope;
> +
> +done:
> +	return err;
> +}
> +
> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +
> +	if (!opt)
> +		return -EINVAL;
> +
> +	q->qdisc = fifo_create_dflt(sch, &pfifo_qdisc_ops, 1024);
> +	qdisc_hash_add(q->qdisc, true);
> +
> +	return cbs_change(sch, opt);
> +}
> +
> +static void cbs_destroy(struct Qdisc *sch)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +	struct tc_cbs_qopt_offload cbs = { };
> +	struct net_device *dev;
> +	int err;
> +
> +	q->hicredit = 0;
> +	q->locredit = 0;
> +	q->idleslope = 0;
> +	q->sendslope = 0;
> +
> +	dev = qdisc_dev(sch);
> +
> +	cbs.queue = q->queue;
> +	cbs.enable = 0;
> +
> +	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);

This can lead to NULL pointer deref if it is not set (tested for in 
cbs_change and error there leads us here, which then explodes).

> +	if (err < 0)
> +		pr_warn("Couldn't reset queue %d to default values\n",
> +			cbs.queue);
> +
> +	qdisc_destroy(q->qdisc);

Same, q->qdisc was NULL when cbs_init() failed.

> +}
> +
> +static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +	struct nlattr *nest;
> +	struct tc_cbs_qopt opt;
> +
> +	sch->qstats.backlog = q->qdisc->qstats.backlog;
> +	nest = nla_nest_start(skb, TCA_OPTIONS);
> +	if (!nest)
> +		goto nla_put_failure;
> +
> +	opt.hicredit = q->hicredit;
> +	opt.locredit = q->locredit;
> +	opt.sendslope = q->sendslope;
> +	opt.idleslope = q->idleslope;
> +
> +	if (nla_put(skb, TCA_CBS_PARMS, sizeof(opt), &opt))
> +		goto nla_put_failure;
> +
> +	return nla_nest_end(skb, nest);
> +
> +nla_put_failure:
> +	nla_nest_cancel(skb, nest);
> +	return -1;
> +}
> +
> +static int cbs_dump_class(struct Qdisc *sch, unsigned long cl,
> +			  struct sk_buff *skb, struct tcmsg *tcm)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +
> +	tcm->tcm_handle |= TC_H_MIN(1);
> +	tcm->tcm_info = q->qdisc->handle;
> +
> +	return 0;
> +}
> +
> +static int cbs_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
> +		     struct Qdisc **old)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +
> +	if (!new)
> +		new = &noop_qdisc;
> +
> +	*old = qdisc_replace(sch, new, &q->qdisc);
> +	return 0;
> +}
> +
> +static struct Qdisc *cbs_leaf(struct Qdisc *sch, unsigned long arg)
> +{
> +	struct cbs_sched_data *q = qdisc_priv(sch);
> +
> +	return q->qdisc;
> +}
> +
> +static unsigned long cbs_find(struct Qdisc *sch, u32 classid)
> +{
> +	return 1;
> +}
> +
> +static int cbs_delete(struct Qdisc *sch, unsigned long arg)
> +{
> +	return 0;
> +}
> +
> +static void cbs_walk(struct Qdisc *sch, struct qdisc_walker *walker)
> +{
> +	if (!walker->stop) {
> +		if (walker->count >= walker->skip)
> +			if (walker->fn(sch, 1, walker) < 0) {
> +				walker->stop = 1;
> +				return;
> +			}
> +		walker->count++;
> +	}
> +}
> +
> +static const struct Qdisc_class_ops cbs_class_ops = {
> +	.graft		=	cbs_graft,
> +	.leaf		=	cbs_leaf,
> +	.find		=	cbs_find,
> +	.delete		=	cbs_delete,
> +	.walk		=	cbs_walk,
> +	.dump		=	cbs_dump_class,
> +};
> +
> +static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
> +	.next		=	NULL,
> +	.cl_ops		=	&cbs_class_ops,
> +	.id		=	"cbs",
> +	.priv_size	=	sizeof(struct cbs_sched_data),
> +	.enqueue	=	cbs_enqueue,
> +	.dequeue	=	cbs_dequeue,
> +	.peek		=	qdisc_peek_dequeued,
> +	.init		=	cbs_init,
> +	.reset		=	cbs_reset,
> +	.destroy	=	cbs_destroy,
> +	.change		=	cbs_change,
> +	.dump		=	cbs_dump,
> +	.owner		=	THIS_MODULE,
> +};
> +
> +static int __init cbs_module_init(void)
> +{
> +	return register_qdisc(&cbs_qdisc_ops);
> +}
> +
> +static void __exit cbs_module_exit(void)
> +{
> +	unregister_qdisc(&cbs_qdisc_ops);
> +}
> +module_init(cbs_module_init)
> +module_exit(cbs_module_exit)
> +MODULE_LICENSE("GPL");
> -- 
> 2.14.1
> 

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-08  1:29   ` Vinicius Costa Gomes
@ 2017-09-12  4:56     ` Richard Cochran
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-12  4:56 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Henrik Austad, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

On Thu, Sep 07, 2017 at 06:29:00PM -0700, Vinicius Costa Gomes wrote:
> >>  * Time-aware shaper (802.1Qbv):
> >>
> >>    The idea we are currently exploring is to add a "time-aware", priority based
> >>    qdisc, that also exposes the Tx queues available and provides a mechanism for
> >>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
> >>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> >
> > As far as I know, this is not supported by i210, and if time-aware shaping
> > is enabled in the network - you'll be queued on a bridge until the window
> > opens as time-aware shaping is enforced on the tx-port and not on rx. Is
> > this required in this driver?
> 
> Yeah, i210 doesn't support the time-aware shaper. I think the second
> part of your question doesn't really apply, then.

Actually, you can implement 802.1Qbv (as an end station) quite easily
using the i210.  I'll show how by posting a series after net-next
opens up again.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc
  2017-09-08 13:43   ` Henrik Austad
@ 2017-09-14  0:39     ` Vinicius Costa Gomes
  0 siblings, 0 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-14  0:39 UTC (permalink / raw)
  To: Henrik Austad
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran

Hi Henrik,

Henrik Austad <henrik@austad.us> writes:

> On Thu, Aug 31, 2017 at 06:26:22PM -0700, Vinicius Costa Gomes wrote:
>> This queueing discipline implements the shaper algorithm defined by
>> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.
>>
>> It's primary usage is to apply some bandwidth reservation to user
>> defined traffic classes, which are mapped to different queues via the
>> mqprio qdisc.
>>
>> Initially, it only supports offloading the traffic shaping work to
>> supporting controllers.
>>
>> Later, when a software implementation is added, the current dependency
>> on being installed "under" mqprio can be lifted.
>>
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>
> So, I started testing this in my VM to make sure I didn't do anything silly
> and it exploded spectacularly as it used the underlying e1000 driver which
> does not have a ndo_setup_tc present.
>
> I commented inline below, but as a tl;dr
>
> The cbs_init() would call into cbs_change() that would correctly detect
> that ndo_setup_tc is missing and abort. However, at this stage it would try
> to desroy the qdisc and in cbs_destroy() there's no such guard leading to a
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>
> Fixing that, another NULL would be found when the code walks into
> qdisc_destroy(q->qdisc).
>
> Apart from this, it loaded fine after some wrestling with tc :)
>
> Awesome! :D
>
>> ---
>>  include/linux/netdevice.h |   1 +
>>  net/sched/Kconfig         |  11 ++
>>  net/sched/Makefile        |   1 +
>>  net/sched/sch_cbs.c       | 286 ++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 299 insertions(+)
>>  create mode 100644 net/sched/sch_cbs.c
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 35de8312e0b5..dd9a2ecd0c03 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -775,6 +775,7 @@ enum tc_setup_type {
>>  	TC_SETUP_CLSFLOWER,
>>  	TC_SETUP_CLSMATCHALL,
>>  	TC_SETUP_CLSBPF,
>> +	TC_SETUP_CBS,
>>  };
>>
>>  /* These structures hold the attributes of xdp state that are being passed
>> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
>> index e70ed26485a2..c03d86a7775e 100644
>> --- a/net/sched/Kconfig
>> +++ b/net/sched/Kconfig
>> @@ -172,6 +172,17 @@ config NET_SCH_TBF
>>  	  To compile this code as a module, choose M here: the
>>  	  module will be called sch_tbf.
>>
>> +config NET_SCH_CBS
>
> Shouldn't this depend on NET_SCH_MQPRIO as it is supposed to hook into
> that?

Right now, the CBS shaper only makes sense with mqprio, later it may use
some other qdisc (like "taprio" mentioned in the cover letter) to hook
into, so, yes, you are right, there's a dependency here, better make it
clear. Will fix.

>
> @@ -173,6 +173,7 @@ config NET_SCH_TBF
>           module will be called sch_tbf.
>
>  config NET_SCH_CBS
> +       depends on NET_SCH_MQPRIO
>         tristate "Credit Based Shaper (CBS)"
>         ---help---
>           Say Y here if you want to use the Credit Based Shaper (CBS) packet
>
>> +	tristate "Credit Based Shaper (CBS)"
>> +	---help---
>> +	  Say Y here if you want to use the Credit Based Shaper (CBS) packet
>> +	  scheduling algorithm.
>> +
>> +	  See the top of <file:net/sched/sch_cbs.c> for more details.
>> +
>> +	  To compile this code as a module, choose M here: the
>> +	  module will be called sch_cbs.
>> +
>>  config NET_SCH_GRED
>>  	tristate "Generic Random Early Detection (GRED)"
>>  	---help---
>> diff --git a/net/sched/Makefile b/net/sched/Makefile
>> index 7b915d226de7..80c8f92d162d 100644
>> --- a/net/sched/Makefile
>> +++ b/net/sched/Makefile
>> @@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
>>  obj-$(CONFIG_NET_SCH_FQ)	+= sch_fq.o
>>  obj-$(CONFIG_NET_SCH_HHF)	+= sch_hhf.o
>>  obj-$(CONFIG_NET_SCH_PIE)	+= sch_pie.o
>> +obj-$(CONFIG_NET_SCH_CBS)	+= sch_cbs.o
>>
>>  obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
>>  obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
>> diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
>> new file mode 100644
>> index 000000000000..1c86a9e14150
>> --- /dev/null
>> +++ b/net/sched/sch_cbs.c
>> @@ -0,0 +1,286 @@
>> +/*
>> + * net/sched/sch_cbs.c	Credit Based Shaper
>> + *
>> + *		This program is free software; you can redistribute it and/or
>> + *		modify it under the terms of the GNU General Public License
>> + *		as published by the Free Software Foundation; either version
>> + *		2 of the License, or (at your option) any later version.
>> + *
>> + * Authors:	Vininicius Costa Gomes <vinicius.gomes@intel.com>
>> + *
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/types.h>
>> +#include <linux/kernel.h>
>> +#include <linux/string.h>
>> +#include <linux/errno.h>
>> +#include <linux/skbuff.h>
>> +#include <net/netlink.h>
>> +#include <net/sch_generic.h>
>> +#include <net/pkt_sched.h>
>> +
>> +struct cbs_sched_data {
>> +	struct Qdisc *qdisc; /* Inner qdisc, default - pfifo queue */
>> +	s32 queue;
>> +	s32 locredit;
>> +	s32 hicredit;
>> +	s32 sendslope;
>> +	s32 idleslope;
>> +};
>> +
>> +static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>> +		       struct sk_buff **to_free)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +	int ret;
>> +
>> +	ret = qdisc_enqueue(skb, q->qdisc, to_free);
>> +	if (ret != NET_XMIT_SUCCESS) {
>> +		if (net_xmit_drop_count(ret))
>> +			qdisc_qstats_drop(sch);
>> +		return ret;
>> +	}
>> +
>> +	qdisc_qstats_backlog_inc(sch, skb);
>> +	sch->q.qlen++;
>> +	return NET_XMIT_SUCCESS;
>> +}
>> +
>> +static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +	struct sk_buff *skb;
>> +
>> +	skb = q->qdisc->ops->peek(q->qdisc);
>> +	if (skb) {
>> +		skb = qdisc_dequeue_peeked(q->qdisc);
>> +		if (unlikely(!skb))
>> +			return NULL;
>> +
>> +		qdisc_qstats_backlog_dec(sch, skb);
>> +		sch->q.qlen--;
>> +		qdisc_bstats_update(sch, skb);
>> +
>> +		return skb;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static void cbs_reset(struct Qdisc *sch)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +
>> +	qdisc_reset(q->qdisc);
>> +}
>> +
>> +static const struct nla_policy cbs_policy[TCA_CBS_MAX + 1] = {
>> +	[TCA_CBS_PARMS]	= { .len = sizeof(struct tc_cbs_qopt) },
>> +};
>> +
>> +static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +	struct tc_cbs_qopt_offload cbs = { };
>> +	struct nlattr *tb[TCA_CBS_MAX + 1];
>> +	const struct net_device_ops *ops;
>> +	struct tc_cbs_qopt *qopt;
>> +	struct net_device *dev;
>> +	int err;
>> +
>> +	err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
>> +	if (err < 0)
>> +		return err;
>> +
>> +	err = -EINVAL;
>> +	if (!tb[TCA_CBS_PARMS])
>> +		goto done;
>> +
>> +	qopt = nla_data(tb[TCA_CBS_PARMS]);
>> +
>> +	dev = qdisc_dev(sch);
>> +	ops = dev->netdev_ops;
>> +
>> +	/* FIXME: this means that we can only install this qdisc
>> +	 * "under" mqprio. Do we need a more generic way to retrieve
>> +	 * the queue, or do we pass the netdev_queue to the driver?
>> +	 */
>> +	cbs.queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
>> +
>> +	cbs.enable = 1;
>> +	cbs.hicredit = qopt->hicredit;
>> +	cbs.locredit = qopt->locredit;
>> +	cbs.idleslope = qopt->idleslope;
>> +	cbs.sendslope = qopt->sendslope;
>> +
>> +	err = -ENOTSUPP;
>> +	if (!ops->ndo_setup_tc)
>> +		goto done;
>
> This confuses tc a bit, and looking at other qdisc schedulers, it seems
> like EOPNOTSUPP is an alternative, this changes the output from
>
> RTNETLINK answers: Unknown error 524
>  - to
> RTNETLINK answers: Operation not supported
>

Yeah, I missed this. Thanks for catching this.

> which perhaps is more informative.
>
>> +
>> +	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
>> +	if (err < 0)
>> +		goto done;
>> +
>> +	q->queue = cbs.queue;
>> +	q->hicredit = cbs.hicredit;
>> +	q->locredit = cbs.locredit;
>> +	q->idleslope = cbs.idleslope;
>> +	q->sendslope = cbs.sendslope;
>> +
>> +done:
>> +	return err;
>> +}
>> +
>> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +
>> +	if (!opt)
>> +		return -EINVAL;
>> +
>> +	q->qdisc = fifo_create_dflt(sch, &pfifo_qdisc_ops, 1024);
>> +	qdisc_hash_add(q->qdisc, true);
>> +
>> +	return cbs_change(sch, opt);
>> +}
>> +
>> +static void cbs_destroy(struct Qdisc *sch)
>> +{
>> +	struct cbs_sched_data *q = qdisc_priv(sch);
>> +	struct tc_cbs_qopt_offload cbs = { };
>> +	struct net_device *dev;
>> +	int err;
>> +
>> +	q->hicredit = 0;
>> +	q->locredit = 0;
>> +	q->idleslope = 0;
>> +	q->sendslope = 0;
>> +
>> +	dev = qdisc_dev(sch);
>> +
>> +	cbs.queue = q->queue;
>> +	cbs.enable = 0;
>> +
>> +	err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_CBS, &cbs);
>
> This can lead to NULL pointer deref if it is not set (tested for in
> cbs_change and error there leads us here, which then explodes).

Fixed.

>
>> +	if (err < 0)
>> +		pr_warn("Couldn't reset queue %d to default values\n",
>> +			cbs.queue);
>> +
>> +	qdisc_destroy(q->qdisc);
>
> Same, q->qdisc was NULL when cbs_init() failed.
>

Fixed the error path, thanks.


Cheers,

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (6 preceding siblings ...)
  2017-09-07  5:34 ` Henrik Austad
@ 2017-09-18  8:02 ` Richard Cochran
  2017-09-18 11:46   ` Henrik Austad
  2017-09-18 23:06   ` Vinicius Costa Gomes
  2017-09-18  8:12 ` Richard Cochran
  2017-09-20  1:59 ` levipearson
  9 siblings, 2 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-18  8:02 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>  * Time-aware shaper (802.1Qbv):

I just posted a working alternative showing how to handle 802.1Qbv and
many other Ethernet field buses.
 
>    The idea we are currently exploring is to add a "time-aware", priority based
>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> 
>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> 	   queues 0 1 2 3                                              \
>      	   sched-file gates.sched [base-time <interval>]               \
>            [cycle-time <interval>] [extension-time <interval>]
> 
>    <file> is multi-line, with each line being of the following format:
>    <cmd> <gate mask> <interval in nanoseconds>
> 
>    Qbv only defines one <cmd>: "S" for 'SetGates'
> 
>    For example:
> 
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.

The idea of the schedule file will not work in practice.  Consider the
fact that the application wants to deliver time critical data in a
particular slot.  How can it find out a) what the time slots are and
b) when the next slot is scheduled?  With this Qdisc, it cannot do
this, AFAICT.  The admin might delete the file after configuring the
Qdisc!

Using the SO_TXTIME option, the application has total control over the
scheduling.  The great advantages of this approach is that we can
support any possible combination of periodic or aperiodic scheduling
and we can support any priority scheme user space dreams up.

For example, one can imaging running two or more loops that only
occasionally collide.  When they do collide, which packet should be
sent first?  Just let user space decide.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (7 preceding siblings ...)
  2017-09-18  8:02 ` Richard Cochran
@ 2017-09-18  8:12 ` Richard Cochran
  2017-09-20  5:17   ` TSN Scorecard, was " levipearson
  2017-09-20  1:59 ` levipearson
  9 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-18  8:12 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong, henrik,
	tglx

On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
> This patchset is an RFC on a proposal of how the Traffic Control subsystem can
> be used to offload the configuration of traffic shapers into network devices
> that provide support for them in HW. Our goal here is to start upstreaming
> support for features related to the Time-Sensitive Networking (TSN) set of
> standards into the kernel.

Just for the record, here is my score card showing the current status
of TSN support in Linux.  Comments and corrections are more welcome.

Thanks,
Richard


 | FEATURE                                        | STANDARD            | STATUS                       |
 |------------------------------------------------+---------------------+------------------------------|
 | Synchronization                                | 802.1AS-2011        | Implemented in               |
 |                                                |                     | - Linux kernel PHC subsystem |
 |                                                |                     | - linuxptp (userspace)       |
 |------------------------------------------------+---------------------+------------------------------|
 | Forwarding and Queuing Enhancements            | 802.1Q-2014 sec. 34 | RFC posted (this thread)     |
 | for Time-Sensitive Streams (FQTSS)             |                     |                              |
 |------------------------------------------------+---------------------+------------------------------|
 | Stream Reservation Protocol (SRP)              | 802.1Q-2014 sec. 35 | in Open-AVB [1]              |
 |------------------------------------------------+---------------------+------------------------------|
 | Audio Video Transport Protocol (AVTP)          | IEEE 1722-2011      | DNE                          |
 |------------------------------------------------+---------------------+------------------------------|
 | Audio/Video Device Discovery, Enumeration,     | IEEE 1722.1-2013    | jdksavdecc-c [2]             |
 | Connection Management and Control (AVDECC)     |                     |                              |
 | AVDECC Connection Management Protocol (ACMP)   |                     |                              |
 | AVDECC Enumeration and Control Protocol (AECP) |                     |                              |
 | MAC Address Acquisition Protocol (MAAP)        |                     | in Open-AVB                  |
 |------------------------------------------------+---------------------+------------------------------|
 | Frame Preemption                               | P802.1Qbu           | DNE                          |
 | Scheduled Traffic                              | P802.1Qbv           | RFC posted (SO_TXTIME)       |
 | SRP Enhancements and Performance Improvements  | P802.1Qcc           | DNE                          |

 DNE = Does Not Exist (to my knowledge)

1. https://github.com/Avnu/OpenAvnu

   (DISCLAIMER from the website:)

   It is planned to eventually include the various packet encapsulation types,
   protocol discovery daemons, libraries to convert media clocks to AVB clocks
   and vice versa, and drivers.

   This repository does not include all components required to build a full
   production AVB/TSN system (e.g. a turnkey solution to stream stored or live audio
   or video content). Some simple example applications are provided which
   illustrate the flow - but a professional Audio/Video system requires a full media stack
   - including audio and video inputs and outputs, media processing elements, and
   various graphical user interfaces. Various companies provide such integrated
   solutions.

2. https://github.com/jdkoftinoff/jdksavdecc-c

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-18  8:02 ` Richard Cochran
@ 2017-09-18 11:46   ` Henrik Austad
  2017-09-18 23:06   ` Vinicius Costa Gomes
  1 sibling, 0 replies; 45+ messages in thread
From: Henrik Austad @ 2017-09-18 11:46 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

[-- Attachment #1: Type: text/plain, Size: 3289 bytes --]

On Mon, Sep 18, 2017 at 10:02:14AM +0200, Richard Cochran wrote:
> On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
> >  * Time-aware shaper (802.1Qbv):
> 
> I just posted a working alternative showing how to handle 802.1Qbv and
> many other Ethernet field buses.

Yes, I saw them, grabbing them for testing now - thanks!

> >    The idea we are currently exploring is to add a "time-aware", priority based
> >    qdisc, that also exposes the Tx queues available and provides a mechanism for
> >    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
> >    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> > 
> >    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
> >      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> > 	   queues 0 1 2 3                                              \
> >      	   sched-file gates.sched [base-time <interval>]               \
> >            [cycle-time <interval>] [extension-time <interval>]
> > 
> >    <file> is multi-line, with each line being of the following format:
> >    <cmd> <gate mask> <interval in nanoseconds>
> > 
> >    Qbv only defines one <cmd>: "S" for 'SetGates'
> > 
> >    For example:
> > 
> >    S 0x01 300
> >    S 0x03 500
> > 
> >    This means that there are two intervals, the first will have the gate
> >    for traffic class 0 open for 300 nanoseconds, the second will have
> >    both traffic classes open for 500 nanoseconds.
> 
> The idea of the schedule file will not work in practice.  Consider the
> fact that the application wants to deliver time critical data in a
> particular slot.  How can it find out a) what the time slots are and
> b) when the next slot is scheduled?  With this Qdisc, it cannot do
> this, AFAICT.  The admin might delete the file after configuring the
> Qdisc!
> 
> Using the SO_TXTIME option, the application has total control over the
> scheduling.  The great advantages of this approach is that we can
> support any possible combination of periodic or aperiodic scheduling
> and we can support any priority scheme user space dreams up.

Using SO_TXTIME makes a lot of sense. TSN has a presentation_time, which 
you can use to deduce the time it should be transmitted (Class A has a 2ms 
latency guarantee, B has 50), but given how TSN uses the timestamp, it will 
wrap every 4.3 seconds, using SO_TXTIME allows you to schedule transmission 
at a much later time. It should also lessen the dependency on a specific 
protocol, which is also good.

> For example, one can imaging running two or more loops that only
> occasionally collide.  When they do collide, which packet should be
> sent first?  Just let user space decide.

If 2 userspace apps send to the same Tx-queue with the same priority, would 
it not make sense to just do FIFO? For all practical purposes, they have 
the same importance (same SO_PRIORITY, same SO_TXTIME). If the priority 
differs, then they would be directed to different queues, where one queue 
will take presedence anyway.

How far into the future would it make sense to schedule packets anyway?

I'll have a look at the other series you just posted!

-- 
Henrik Austad

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-18  8:02 ` Richard Cochran
  2017-09-18 11:46   ` Henrik Austad
@ 2017-09-18 23:06   ` Vinicius Costa Gomes
  2017-09-19  5:22     ` Richard Cochran
  1 sibling, 1 reply; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-18 23:06 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

Hi Richard,

Richard Cochran <richardcochran@gmail.com> writes:

> On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>>  * Time-aware shaper (802.1Qbv):
>
> I just posted a working alternative showing how to handle 802.1Qbv and
> many other Ethernet field buses.
>
>>    The idea we are currently exploring is to add a "time-aware", priority based
>>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
>>
>>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
>> 	   queues 0 1 2 3                                              \
>>      	   sched-file gates.sched [base-time <interval>]               \
>>            [cycle-time <interval>] [extension-time <interval>]
>>
>>    <file> is multi-line, with each line being of the following format:
>>    <cmd> <gate mask> <interval in nanoseconds>
>>
>>    Qbv only defines one <cmd>: "S" for 'SetGates'
>>
>>    For example:
>>
>>    S 0x01 300
>>    S 0x03 500
>>
>>    This means that there are two intervals, the first will have the gate
>>    for traffic class 0 open for 300 nanoseconds, the second will have
>>    both traffic classes open for 500 nanoseconds.
>
> The idea of the schedule file will not work in practice.  Consider the
> fact that the application wants to deliver time critical data in a
> particular slot.  How can it find out a) what the time slots are and
> b) when the next slot is scheduled?  With this Qdisc, it cannot do
> this, AFAICT.  The admin might delete the file after configuring the
> Qdisc!

That's the point, the application does not need to know that, and asking
that would be stupid. From the point of view of the Qbv specification,
applications only need to care about its basic bandwidth requirements:
its interval, frame size, frames per interval (using the terms of the
SRP section of 802.1Q). The traffic schedule is provided (off band) by a
"god box" which knows all the requirements of all applications in all
the nodes and how they are connected.

(And that's another nice point of how 802.1Qbv works, applications do
not need to be changed to use it, and I think we should work to achieve
this on the Linux side)

That being said, that only works for kinds of traffic that maps well to
this configuration in advance model, which is the model that the IEEE
(see 802.1Qcc) and the AVNU Alliance[1] are pushing for.

In the real world, I can see multiple types of applications, some using
something like TXTIME, and some configured in advance.

>
> Using the SO_TXTIME option, the application has total control over the
> scheduling.  The great advantages of this approach is that we can
> support any possible combination of periodic or aperiodic scheduling
> and we can support any priority scheme user space dreams up.

It has the disavantage of that the scheduling information has to be
in-band with the data. I *really* think that for scheduled traffic,
there should be a clear separation, we should not mix the dataflow with
scheduling. In short, an application in the network don't need to have
all the information necessary to schedule its own traffic well.

I have two points here: 1. I see both "solutions" (taprio and SO_TXTIME)
as being ortoghonal and useful, both; 2. trying to make one do the job
of the other, however, looks like "If all I have is a hammer, everything
looks like a nail".

In short, I see a per-packet transmission time and a per-queue schedule
as solutions to different problems.

>
> For example, one can imaging running two or more loops that only
> occasionally collide.  When they do collide, which packet should be
> sent first?  Just let user space decide.
>
> Thanks,
> Richard

Cheers,

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-18 23:06   ` Vinicius Costa Gomes
@ 2017-09-19  5:22     ` Richard Cochran
  2017-09-19 13:14       ` Henrik Austad
  2017-09-20  0:19       ` Vinicius Costa Gomes
  0 siblings, 2 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-19  5:22 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote:
> That's the point, the application does not need to know that, and asking
> that would be stupid.

On the contrary, this information is essential to the application.
Probably you have never seen an actual Ethernet field bus in
operation?  In any case, you are missing the point.

> (And that's another nice point of how 802.1Qbv works, applications do
> not need to be changed to use it, and I think we should work to achieve
> this on the Linux side)

Once you start to care about real time performance, then you need to
consider the applications.  This is industrial control, not streaming
your tunes from your ipod.
 
> That being said, that only works for kinds of traffic that maps well to
> this configuration in advance model, which is the model that the IEEE
> (see 802.1Qcc) and the AVNU Alliance[1] are pushing for.

Again, you are missing the point of what they aiming for.  I have
looked at a number of production systems, and in each case the
developers want total control over the transmission, in order to
reduce latency to an absolute minimum.  Typically the data to be sent
are available only microseconds before the transmission deadline.

Consider OpenAVB on github that people are already using.  Take a look
at simple_talker.c and explain how "applications do not need to be
changed to use it."

> [1]
> http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/

Did you even read this?

    [page 24]

    As described in section 2, some industrial control systems require
    predictable, very low latency and cycle-to-cycle variation to meet
    hard real-time application requirements. In these systems,
    multiple distributed controllers commonly synchronize their
    sensor/actuator operations with other controllers by scheduling
    these operations in time, typically using a repeating control
    cycle.
    ...
    The gate control mechanism is itself a time-aware PTP application
    operating within a bridge or end station port.

It is an application, not a "god box."

> In short, I see a per-packet transmission time and a per-queue schedule
> as solutions to different problems.

Well, I can agree with that.  For some non real-time applications,
bandwidth shaping is enough, and your Qdisc idea is sufficient.  For
the really challenging TSN targets (industrial control, automotive),
your idea of an opaque schedule file won't fly.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-19  5:22     ` Richard Cochran
@ 2017-09-19 13:14       ` Henrik Austad
  2017-09-20  0:19       ` Vinicius Costa Gomes
  1 sibling, 0 replies; 45+ messages in thread
From: Henrik Austad @ 2017-09-19 13:14 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong

[-- Attachment #1: Type: text/plain, Size: 4710 bytes --]

Hi all,

On Tue, Sep 19, 2017 at 07:22:44AM +0200, Richard Cochran wrote:
> On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote:
> > That's the point, the application does not need to know that, and asking
> > that would be stupid.
> 
> On the contrary, this information is essential to the application.
> Probably you have never seen an actual Ethernet field bus in
> operation?  In any case, you are missing the point.
> 
> > (And that's another nice point of how 802.1Qbv works, applications do
> > not need to be changed to use it, and I think we should work to achieve
> > this on the Linux side)
> 
> Once you start to care about real time performance, then you need to
> consider the applications.  This is industrial control, not streaming
> your tunes from your ipod.

Do not underestimate the need for media over TSN. I fully see your point of 
real-time systems, but they are not the only valid use-cases for TSN.

> > That being said, that only works for kinds of traffic that maps well to
> > this configuration in advance model, which is the model that the IEEE
> > (see 802.1Qcc) and the AVNU Alliance[1] are pushing for.
> 
> Again, you are missing the point of what they aiming for.  I have
> looked at a number of production systems, and in each case the
> developers want total control over the transmission, in order to
> reduce latency to an absolute minimum.  Typically the data to be sent
> are available only microseconds before the transmission deadline.
> 
> Consider OpenAVB on github that people are already using.  Take a look
> at simple_talker.c and explain how "applications do not need to be
> changed to use it."

I do not think simple-talker was everintended to be how users of AVB should 
be implemented, but as a demonstration of what the protocol could do.

ALSA/V4L2 should supply some interface to this so that you can attach 
media-applications to it without the application itself having to be "TSN 
aware".

> > [1]
> > http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/
> 
> Did you even read this?
> 
>     [page 24]
> 
>     As described in section 2, some industrial control systems require
>     predictable, very low latency and cycle-to-cycle variation to meet
>     hard real-time application requirements. In these systems,
>     multiple distributed controllers commonly synchronize their
>     sensor/actuator operations with other controllers by scheduling
>     these operations in time, typically using a repeating control
>     cycle.
>     ...
>     The gate control mechanism is itself a time-aware PTP application
>     operating within a bridge or end station port.
> 
> It is an application, not a "god box."
>
> > In short, I see a per-packet transmission time and a per-queue schedule
> > as solutions to different problems.
> 
> Well, I can agree with that.  For some non real-time applications,
> bandwidth shaping is enough, and your Qdisc idea is sufficient.  For
> the really challenging TSN targets (industrial control, automotive),
> your idea of an opaque schedule file won't fly.

Would it make sense to adapt the proposed Qdisc here as well as the 
back-o-the-napkin idea in the other thread to to a per-socket queue for 
each priority and then sort those sockets based on SO_TXTIME?

TSN operates on a per-StreamID basis, and that should map fairly well to a 
per-socket approach I think (let us just assume that an application that 
sends TSN traffic will open up a separate socket for each stream.

This should allow a userspace application that is _very_ aware of its 
timing constraints to send frames exactly when it needs to as you have 
SO_TXTIME available. It would also let applications that basically want a 
fine-grained rate control (audio and video comes to mind) to use the same 
qdisc.

For those sockets that do not support SO_TXTIME, but still map to a 
priority handled by sch_cbs (or whatever it'll end up being called) you can 
set the transmit-time to be the time of the last skb in the queue + an 
delta which will give you the correct rate (TSN operates on observation 
intervals which you can specify via tc when you create the queues).

Then you can have, as you propose in your other series, a hrtimer that is 
being called when the next SO_TXTIME enters, grab a skb and move it to the 
hw-queue. This should also allow you to keep a sorted per-socket queue 
should an application send frames in the wrong order, without having to 
rearrange descriptors for the DMA machinery.

If this makes sense, I am more than happy to give it a stab and see how it 
goes.

-Henrik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-19  5:22     ` Richard Cochran
  2017-09-19 13:14       ` Henrik Austad
@ 2017-09-20  0:19       ` Vinicius Costa Gomes
  2017-09-20  5:25         ` Richard Cochran
  2017-09-20  5:58         ` Richard Cochran
  1 sibling, 2 replies; 45+ messages in thread
From: Vinicius Costa Gomes @ 2017-09-20  0:19 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

Hi Richard,

Richard Cochran <richardcochran@gmail.com> writes:

> On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote:
>> That's the point, the application does not need to know that, and asking
>> that would be stupid.
>
> On the contrary, this information is essential to the application.
> Probably you have never seen an actual Ethernet field bus in
> operation?  In any case, you are missing the point.
>
>> (And that's another nice point of how 802.1Qbv works, applications do
>> not need to be changed to use it, and I think we should work to achieve
>> this on the Linux side)
>
> Once you start to care about real time performance, then you need to
> consider the applications.  This is industrial control, not streaming
> your tunes from your ipod.
>
>> That being said, that only works for kinds of traffic that maps well to
>> this configuration in advance model, which is the model that the IEEE
>> (see 802.1Qcc) and the AVNU Alliance[1] are pushing for.
>
> Again, you are missing the point of what they aiming for.  I have
> looked at a number of production systems, and in each case the
> developers want total control over the transmission, in order to
> reduce latency to an absolute minimum.  Typically the data to be sent
> are available only microseconds before the transmission deadline.
>
> Consider OpenAVB on github that people are already using.  Take a look
> at simple_talker.c and explain how "applications do not need to be
> changed to use it."

Just let me use the mention of OpenAVNU as a hook to explain what we
(the team I am part of) are working to do, perhaps it will make our
choices and designs clearer.

One of the problems with OpenAVNU is that it's too coupled with the i210
NIC. One of the things we want is to decouple OpenAVNU from the
controller. The way we thought best was to propose interfaces (that
would work along side to the Linux networking stack) as close as
possible to what the current standards define, that means the IEEE
802.1Q family of specifications, in the hope that network controller
vendors would also look at the specifications when designing their
controllers.

Our objective with the Qdiscs we are proposing (both cbs and taprio) is
to provide a sane way to configure controllers that support TSN features
(we were looking specifically at the IEEE specs).

After we have some rough consensus on the interfaces to use, then we can
start working on OpenAVNU.

>
>> [1]
>> http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/
>
> Did you even read this?
>
>     [page 24]
>
>     As described in section 2, some industrial control systems require
>     predictable, very low latency and cycle-to-cycle variation to meet
>     hard real-time application requirements. In these systems,
>     multiple distributed controllers commonly synchronize their
>     sensor/actuator operations with other controllers by scheduling
>     these operations in time, typically using a repeating control
>     cycle.
>     ...
>     The gate control mechanism is itself a time-aware PTP application
>     operating within a bridge or end station port.
>
> It is an application, not a "god box."
>
>> In short, I see a per-packet transmission time and a per-queue schedule
>> as solutions to different problems.
>
> Well, I can agree with that.  For some non real-time applications,
> bandwidth shaping is enough, and your Qdisc idea is sufficient.  For
> the really challenging TSN targets (industrial control, automotive),
> your idea of an opaque schedule file won't fly.

(Sorry if I am being annoying here, but the idea of an opaque schedule
is not ours, that comes from the people who wrote the Qbv specification)

I have a question, what about a controller that doesn't provide a way to
set a per-packet transmission time, but it supports Qbv/Qbu. What would
be your proposal to configure it?

(I think LaunchTime is something specific to the i210, right?)


Cheers,

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
                   ` (8 preceding siblings ...)
  2017-09-18  8:12 ` Richard Cochran
@ 2017-09-20  1:59 ` levipearson
  2017-09-20  5:56   ` Richard Cochran
  9 siblings, 1 reply; 45+ messages in thread
From: levipearson @ 2017-09-20  1:59 UTC (permalink / raw)
  To: vinicius.gomes
  Cc: netdev, intel-wired-lan, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong, jhs, xiyou.wangcong,
	jiri, richardcochran, henrik


On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
> Hi,
> 
> This patchset is an RFC on a proposal of how the Traffic Control subsystem can
> be used to offload the configuration of traffic shapers into network devices
> that provide support for them in HW. Our goal here is to start upstreaming
> support for features related to the Time-Sensitive Networking (TSN) set of
> standards into the kernel.

I'm very excited to see these features moving into the kernel! I am one of the
maintainers of the OpenAvnu project and I've been involved in building AVB/TSN
systems and working on the standards for around 10 years, so the support that's
been slowly making it into more silicon and now Linux drivers is very
encouraging.

My team at Harman is working on endpoint code based on what's in the OpenAvnu
project and a few Linux-based platforms. The Qav interface you've proposed will
fit nicely with our traffic shaper management daemon, which already uses mqprio
as a base but uses the htb shaper to approximate the Qav credit-based shaper on
platforms where launch time scheduling isn't available.

I've applied your patches and plan on testing them in conjunction with our
shaper manager to see if we run into any hitches, but I don't expect any
problems.

> As part of this work, we've assessed previous public discussions related to TSN
> enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
> at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
> the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
> 
> Please note that the patches provided as part of this RFC are implementing what
> is needed only for 802.1Qav (FQTSS) only, but we'd like to take advantage of
> this discussion and share our WIP ideas for the 802.1Qbv and 802.1Qbu interfaces
> as well. The current patches are only providing support for HW offload of the
> configs.
> 
> 
> Overview
> ========
> 
> Time-sensitive Networking (TSN) is a set of standards that aim to address
> resources availability for providing bandwidth reservation and bounded latency
> on Ethernet based LANs. The proposal described here aims to cover mainly what is
> needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
> 802.1Qbu.
> 
> The initial target of this work is the Intel i210 NIC, but other controllers'
> datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
> the Synopsis DesignWare Ethernet QoS controller.

Recent SoCs from NXP (the i.MX 6 SoloX, and all the i.MX 7 and 8 parts) support
Qav shaping as well as scheduled launch functionality; these are the parts I 
have been mostly working with. Marvell silicon (some subset of Armada processors
and Link Street DSA switches) generally supports traffic shaping as well.

I think a lack of an interface like this has probably slowed upstream driver
support for this functionality where it exists; most vendors have an out-of-
tree version of their driver with TSN functionality enabled via non-standard
interfaces. Hopefully making it available will encourage vendors to upstream
their driver support!

> Proposal
> ========
> 
> Feature-wise, what is covered here are configuration interfaces for HW
> implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
> (802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
> Qbv and Qbu must be configured per port, with the configuration covering all
> queues. Given that these features are related to traffic shaping, and that the
> traffic control subsystem already provides a queueing discipline that offloads
> config into the device driver (i.e. mqprio), designing new qdiscs for the
> specific purpose of offloading the config for each shaper seemed like a good
> fit.

This makes sense to me too. The 802.1Q standards are all based on the sort of
mappings between priority, traffic class, and hardware queues that the existing
tc infrastructure seems to be modeling. I believe the mqprio module's mapping
scheme is flexible enough to meet any TSN needs in conjunction with the other
parts of the kernel qdisc system.

> For steering traffic into the correct queues, we use the socket option
> SO_PRIORITY and then a mechanism to map priority to traffic classes / Txqueues.
> The qdisc mqprio is currently used in our tests.
> 
> As for the shapers config interface:
> 
>  * CBS (802.1Qav)
> 
>    This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
>    $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
>      idleslope I
> 
>    Note that the parameters for this qdisc are the ones defined by the
>    802.1Q-2014 spec, so no hardware specific functionality is exposed here.

These parameters look good to me as a baseline; some additional optional
parameters may be useful for software-based implementations--such as setting an
interval at which to recalculate queues--but those can be discussed later.

>  * Time-aware shaper (802.1Qbv):

I haven't come across any specific NIC or SoC MAC that does Qbv, but I have
been experimenting with an EspressoBin board, which has a "Topaz" DSA switch
in it that has some features intended for Qbv support, although they were done
with a draft version in mind.

I haven't looked at the interaction between the qdisc subsystem and DSA yet,
but this mechanism might be useful to configure Qbv on the slave ports in
that context. I've got both the board and the documentation, so I might be
able to work on an implementation at some point.

If some endpoint device shows up with direct Qbv support, this interface would
probably work well there too, although a talker would need to be able to
schedule its transmits pretty precisely to achieve the lowest possible latency.

>    The idea we are currently exploring is to add a "time-aware", priority based
>    qdisc, that also exposes the Tx queues available and provides a mechanism for
>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> 
>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> 	   queues 0 1 2 3                                              \
>      	   sched-file gates.sched [base-time <interval>]               \
>            [cycle-time <interval>] [extension-time <interval>]

One concern here is calling the base-time parameter an interval; it's really
an absolute time with respect to the PTP timescale. Good documentation will
be important to this one, since the specification discusses some subtleties
regarding the impact of different time values chosen here.

The format for specifying the actual intervals such as cycle-time could prove
to be an important detail as well; Qbv specifies cycle-time as a ratio of two
integers expressed in seconds, while extension-time is specified as an integer
number of nanoseconds.

Precision with the cycle-time is especially important, since base-time can be
almost arbitrarily far in the past or future, and any given cycle start should
be calculable from the base-time plus/minus some integer multiple of cycle-
time.

>    <file> is multi-line, with each line being of the following format:
>    <cmd> <gate mask> <interval in nanoseconds>
> 
>    Qbv only defines one <cmd>: "S" for 'SetGates'
> 
>    For example:
> 
>    S 0x01 300
>    S 0x03 500
> 
>    This means that there are two intervals, the first will have the gate
>    for traffic class 0 open for 300 nanoseconds, the second will have
>    both traffic classes open for 500 nanoseconds.
> 
>    Additionally, an option to set just one entry of the gate control list will
>    also be provided by 'taprio':
> 
>    $ tc qdisc (...) \
>         sched-row <row number> <cmd> <gate mask> <interval>  \
>         [base-time <interval>] [cycle-time <interval>] \
>         [extension-time <interval>]

If I understand correctly, 'sched-row' is meant to be usable multiple times in
a single command and the 'sched-file' option is just a shorthand version for
large tables? Or is it meant to update an existing schedule table? It doesn't
seem very useful if it can only be specified once when the whole taprio intance
is being established.

>  * Frame Preemption (802.1Qbu):
> 
>    To control even further the latency, it may prove useful to signal which
>    traffic classes are marked as preemptable. For that, 'taprio' provides the
>    preemption command so you set each traffic class as preemptable or not:
> 
>    $ tc qdisc (...) \
>         preemption 0 1 1 1
> 
> 
>  * Time-aware shaper + Preemption:
> 
>    As an example of how Qbv and Qbu can be used together, we may specify
>    both the schedule and the preempt-mask, and this way we may also
>    specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as
>    specified in the Qbu spec:
> 
>    $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \
>      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                    \
> 	   queues 0 1 2 3                                         \
>      	   preemption 0 1 1 1                                     \
> 	   sched-file preempt_gates.sched
> 
>     <file> is multi-line, with each line being of the following format:
>     <cmd> <gate mask> <interval in nanoseconds>
> 
>     For this case, two new commands are introduced:
> 
>     "H" for 'set gates and hold'
>     "R" for 'set gates and release'
> 
>     H 0x01 300
>     R 0x03 500
> 

The new Hold and Release gate commands look right, but I'm not sure about the
preemption flags. Qbu describes a preemption parameter table indexed by
*priority* rather than traffic class or queue. These select which of two MAC
service interfaces is used by the frame at the ISS layer, either express or
preemptable, at the time the frame is selected for transmit. If my
understanding is correct, it's possible to map a preemptable priority as well
as an express priority to the same queue, so flagging preemptability at the
queue level is not correct.

I'm not aware of any endpoint interfaces that support Qbu either, nor do I 
know of any switches that support it that someone could experiment with right
now, so there's no pressure on getting that interface nailed down yet.

Hopefully you find this feedback useful, and I appreciate the effort taken to
get the RFC posted here!

Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* TSN Scorecard, was Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-18  8:12 ` Richard Cochran
@ 2017-09-20  5:17   ` levipearson
  2017-09-20  5:49     ` Richard Cochran
  0 siblings, 1 reply; 45+ messages in thread
From: levipearson @ 2017-09-20  5:17 UTC (permalink / raw)
  To: richardcochran
  Cc: netdev, intel-wired-lan, vinicius.gomes, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong, jhs,
	xiyou.wangcong, jiri, henrik

On Mon, Sep 18, 2017, Richard Cochran wrote:
> Just for the record, here is my score card showing the current status
> of TSN support in Linux.  Comments and corrections are more welcome.
> 
> Thanks,
> Richard
> 
> 
>  | FEATURE                                        | STANDARD            | STATUS                       |
>  |------------------------------------------------+---------------------+------------------------------|
>  | Synchronization                                | 802.1AS-2011        | Implemented in               |
>  |                                                |                     | - Linux kernel PHC subsystem |
>  |                                                |                     | - linuxptp (userspace)       |
>  |------------------------------------------------+---------------------+------------------------------|

An alternate implementation of the userspace portion of gPTP is also available at [1]

>  | Forwarding and Queuing Enhancements            | 802.1Q-2014 sec. 34 | RFC posted (this thread)     |
>  | for Time-Sensitive Streams (FQTSS)             |                     |                              |
>  |------------------------------------------------+---------------------+------------------------------|
>  | Stream Reservation Protocol (SRP)              | 802.1Q-2014 sec. 35 | in Open-AVB [1]              |
>  |------------------------------------------------+---------------------+------------------------------|
>  | Audio Video Transport Protocol (AVTP)          | IEEE 1722-2011      | DNE                          |
>  |------------------------------------------------+---------------------+------------------------------|
>  | Audio/Video Device Discovery, Enumeration,     | IEEE 1722.1-2013    | jdksavdecc-c [2]             |
>  | Connection Management and Control (AVDECC)     |                     |                              |
>  | AVDECC Connection Management Protocol (ACMP)   |                     |                              |
>  | AVDECC Enumeration and Control Protocol (AECP) |                     |                              |
>  | MAC Address Acquisition Protocol (MAAP)        |                     | in Open-AVB                  |
>  |------------------------------------------------+---------------------+------------------------------|

All of the above are available to some degree in the AVTP Pipeline part of [1], specifically at this
location: https://github.com/AVnu/OpenAvnu/tree/master/lib/avtp_pipeline

The code is very modular and configurable, although some parts are in better shape than others. The AVTP
portion can use the custom userspace driver for the i210, which can be configured to use launch scheduling,
or it can use standard kernel interfaces via sendmsg or PACKET_MMAP. It runs as-is when configured for
standard interfaces with any network hardware that supports gPTP. I previously implemented a CMSG-based
launch time scheduling mechanism like the one you have proposed, and I have a socket backend for it that
could easily be ported to your proposal. It is not part of the repository yet since there's no kernel
support for it outside of my prototype and your RFC.

It is currently tied to the OpenAvnu gPTP daemon rather than linuxptp, as it uses a shared memory interface
to get the current rate-ratio and offset information between the various clocks. There may be better ways
to do this, but that's how the initial port of the codebase was done. It would be nice to get it working
with linuxptp's userspace tools at some point as well, though.

The libraries under avtp_pipeline are designed to be used separately, but a simple integrated application
is provided and is built by the CI system.

In addition to OpenAvnu, Renesas has a number of github repositories with what looks like a fairly
complete media streaming system:

https://github.com/renesas-rcar/avb-mse
https://github.com/renesas-rcar/avb-streaming
https://github.com/renesas-rcar/avb-applications

I haven't examined them in great detail yet, though.


>  | Frame Preemption                               | P802.1Qbu           | DNE                          |
>  | Scheduled Traffic                              | P802.1Qbv           | RFC posted (SO_TXTIME)       |
>  | SRP Enhancements and Performance Improvements  | P802.1Qcc           | DNE                          |
> 
>  DNE = Does Not Exist (to my knowledge)

Although your SO_TXTIME proposal could certainly form the basis of an endpoint's implementation of Qbv, I
think it is a stretch to consider it a Qbv implementation in itself, if that's what you mean by this.

I have been working with colleagues on some experiments relating to a Linux-controlled DSN switch
(a Marvell Topaz) that are a part of this effort in TSN: 

http://ieee802.org/1/files/public/docs2017/tsn-cgunther-802-3cg-multidrop-0917-v01.pdf

The proper interfaces for the Qbv configuration and managing of switch-level PTP timestamps are not yet
in place, so there's nothing even at RFC stage to present yet, but Qbv-capable Linux-managed switch
hardware is available and we hope to get some reusable code published even if it's not yet ready to be
integrated in the kernel.

> 
> 1. https://github.com/Avnu/OpenAvnu
> 
>    (DISCLAIMER from the website:)
> 
>    It is planned to eventually include the various packet encapsulation types,
>    protocol discovery daemons, libraries to convert media clocks to AVB clocks
>    and vice versa, and drivers.
> 
>    This repository does not include all components required to build a full
>    production AVB/TSN system (e.g. a turnkey solution to stream stored or live audio
>    or video content). Some simple example applications are provided which
>    illustrate the flow - but a professional Audio/Video system requires a full media stack
>    - including audio and video inputs and outputs, media processing elements, and
>    various graphical user interfaces. Various companies provide such integrated
>    solutions.

A bit of progress has been made since that was written, although it is true that it's still not
quite complete and certainly not turnkey. The most glaring absence at the moment is the media
clock recovery portion of AVTP, but I am actively working on this.

> 
> 2. https://github.com/jdkoftinoff/jdksavdecc-c

This is pulled in as a dependency of the AVDECC code in OpenAvnu; it's used in the command line driven
controller, but not in the avtp_pipeline code that implements the endpoint AVDECC behavior. I don't think
either are complete by any means, but they are complete enough to be mostly compliant and usable in the
subset of behavior they support.

The bulk of the command line controller is a clone of: https://github.com/audioscience/avdecc-lib 

Things are maybe a bit farther along than they seemed, but there is still important kernel work to be
done to reduce the need for out-of-tree drivers and to get everyone on the same interfaces. I plan
to be an active participant going forward.


Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  0:19       ` Vinicius Costa Gomes
@ 2017-09-20  5:25         ` Richard Cochran
  2017-10-18 22:37           ` Jesus Sanchez-Palencia
  2017-09-20  5:58         ` Richard Cochran
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-20  5:25 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

On Tue, Sep 19, 2017 at 05:19:18PM -0700, Vinicius Costa Gomes wrote:
> One of the problems with OpenAVNU is that it's too coupled with the i210
> NIC. One of the things we want is to decouple OpenAVNU from the
> controller.

Yes, I want that, too.

> The way we thought best was to propose interfaces (that
> would work along side to the Linux networking stack) as close as
> possible to what the current standards define, that means the IEEE
> 802.1Q family of specifications, in the hope that network controller
> vendors would also look at the specifications when designing their
> controllers.

These standard define the *behavior*, not the programming APIs.  Our
task as kernel developers is to invent the best interfaces for
supporting 802.1Q and other standards, the hardware capabilities, and
the widest range of applications (not jut AVB).

> Our objective with the Qdiscs we are proposing (both cbs and taprio) is
> to provide a sane way to configure controllers that support TSN features
> (we were looking specifically at the IEEE specs).

I can see how your proposed Qdiscs are inspired by the IEEE standards.
However, in the case of time based transmission, I think there is a
better way to do it, namely with SO_TXTIME (which BTW was originally
proposed by Eric Mann).
 
> After we have some rough consensus on the interfaces to use, then we can
> start working on OpenAVNU.

Did you see my table in the other mail?  Any comments?

> (Sorry if I am being annoying here, but the idea of an opaque schedule
> is not ours, that comes from the people who wrote the Qbv specification)

The schedule is easy to implement using SO_TXTIME.
 
> I have a question, what about a controller that doesn't provide a way to
> set a per-packet transmission time, but it supports Qbv/Qbu. What would
> be your proposal to configure it?

SO_TXTIME will have a generic SW fallback.

BTW, regarding the i210, there is no sensible way to configure both
CBS and time based transmission at the same time.  The card performs a
logical AND to make the launch decision.  The effect of this is that
each and every packet needs a LaunchTime, and the driver would be
forced to guess the time for a packet before entering it into its
queue.

So if we end up merging CBS and SO_TXTIME, then we'll have to make
them exclusive of each other (in the case of the i210) and manage the
i210 queue configurations correctly.

> (I think LaunchTime is something specific to the i210, right?)

To my knowledge yes.  However, if TSN does take hold, then other MAC
vendors will copy it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: TSN Scorecard, was Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  5:17   ` TSN Scorecard, was " levipearson
@ 2017-09-20  5:49     ` Richard Cochran
  2017-09-20 21:29       ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-09-20  5:49 UTC (permalink / raw)
  To: levipearson
  Cc: netdev, intel-wired-lan, vinicius.gomes, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong, jhs,
	xiyou.wangcong, jiri, henrik

On Tue, Sep 19, 2017 at 11:17:54PM -0600, levipearson@gmail.com wrote:
> In addition to OpenAvnu, Renesas has a number of github repositories with what looks like a fairly
> complete media streaming system:

Is it a generic stack or a set of hacks for their HW?

> Although your SO_TXTIME proposal could certainly form the basis of an endpoint's implementation of Qbv, I
> think it is a stretch to consider it a Qbv implementation in itself, if that's what you mean by this.

No, that is not what I meant.  We need some minimal additional kernel
support in order to fully implement the TSN family of standards.  Of
course, the bulk will have to be done in user space.  It would be a
mistake to cram the stuff that belongs in userland into the kernel.

Looking at the table, and reading your descriptions of the state of
OpenAVB, I remained convinced that the kernel needs only three
additions:

1. SO_TXTIME
2. CBS Qdisc
3. ALSA support for DAC clock control (but that is another story)

> The proper interfaces for the Qbv configuration and managing of switch-level PTP timestamps are not yet
> in place, so there's nothing even at RFC stage to present yet, but Qbv-capable Linux-managed switch
> hardware is available and we hope to get some reusable code published even if it's not yet ready to be
> integrated in the kernel.

Right, configuring Qbv in an attached DSA switch needs its own
interface.

Regarding PHC support for DSA switches, I have something in the works
to be published soon.

> A bit of progress has been made since that was written, although it is true that it's still not
> quite complete and certainly not turnkey.

So OpenAVB is neither complete nor turnkey.  That was my impression,
too.

> Things are maybe a bit farther along than they seemed, but there is still important kernel work to be
> done to reduce the need for out-of-tree drivers and to get everyone on the same interfaces. I plan
> to be an active participant going forward.

You mentioned a couple of different kernel things you implemented.
I would encourage you to post the work already done.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  1:59 ` levipearson
@ 2017-09-20  5:56   ` Richard Cochran
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-20  5:56 UTC (permalink / raw)
  To: levipearson
  Cc: vinicius.gomes, netdev, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong, jhs,
	xiyou.wangcong, jiri, henrik

On Tue, Sep 19, 2017 at 07:59:11PM -0600, levipearson@gmail.com wrote:
> If some endpoint device shows up with direct Qbv support, this interface would
> probably work well there too, although a talker would need to be able to
> schedule its transmits pretty precisely to achieve the lowest possible latency.

This is an argument for SO_TXTIME.

> One concern here is calling the base-time parameter an interval; it's really
> an absolute time with respect to the PTP timescale. Good documentation will
> be important to this one, since the specification discusses some subtleties
> regarding the impact of different time values chosen here.
> 
> The format for specifying the actual intervals such as cycle-time could prove
> to be an important detail as well; Qbv specifies cycle-time as a ratio of two
> integers expressed in seconds, while extension-time is specified as an integer
> number of nanoseconds.
> 
> Precision with the cycle-time is especially important, since base-time can be
> almost arbitrarily far in the past or future, and any given cycle start should
> be calculable from the base-time plus/minus some integer multiple of cycle-
> time.

The above three points also.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  0:19       ` Vinicius Costa Gomes
  2017-09-20  5:25         ` Richard Cochran
@ 2017-09-20  5:58         ` Richard Cochran
  1 sibling, 0 replies; 45+ messages in thread
From: Richard Cochran @ 2017-09-20  5:58 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: netdev, jhs, xiyou.wangcong, jiri, intel-wired-lan, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong

On Tue, Sep 19, 2017 at 05:19:18PM -0700, Vinicius Costa Gomes wrote:
> (I think LaunchTime is something specific to the i210, right?)

Levi just told us:

   Recent SoCs from NXP (the i.MX 6 SoloX, and all the i.MX 7 and 8
   parts) support Qav shaping as well as scheduled launch
   functionality;

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: TSN Scorecard, was Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  5:49     ` Richard Cochran
@ 2017-09-20 21:29       ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 45+ messages in thread
From: Jesus Sanchez-Palencia @ 2017-09-20 21:29 UTC (permalink / raw)
  To: Richard Cochran, levipearson
  Cc: netdev, intel-wired-lan, vinicius.gomes, andre.guedes,
	ivan.briano, boon.leong.ong, jhs, xiyou.wangcong, jiri, henrik

Hi,


On 09/19/2017 10:49 PM, Richard Cochran wrote:
(...)

> 
> No, that is not what I meant.  We need some minimal additional kernel
> support in order to fully implement the TSN family of standards.  Of
> course, the bulk will have to be done in user space.  It would be a
> mistake to cram the stuff that belongs in userland into the kernel.
> 
> Looking at the table, and reading your descriptions of the state of
> OpenAVB, I remained convinced that the kernel needs only three
> additions:
> 
> 1. SO_TXTIME
> 2. CBS Qdisc
> 3. ALSA support for DAC clock control (but that is another story)


We'll be posting the CBS v1 series for review soon.

The current SO_TXTIME RFC for the purpose of Launchtime looks great, and we are
looking forward for the v1 + its companion qdisc so we can test / review and
provide feedback.

We are still under the impression that a config interface for HW offload of Qbv
/ Qbu config will be needed, but we'll be deferring the 'taprio' proposal until
there are NICs (end stations) that support these standards available. We can
revisit it if that ever happens, and if it's still needed, but then taking into
account SO_TXTIME (and its related qdisc).

Thanks everyone for all the feedback so far.

Regards,
Jesus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-20  5:25         ` Richard Cochran
@ 2017-10-18 22:37           ` Jesus Sanchez-Palencia
  2017-10-19 20:39             ` Richard Cochran
  0 siblings, 1 reply; 45+ messages in thread
From: Jesus Sanchez-Palencia @ 2017-10-18 22:37 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong,
	Levi Pearson, Henrik Austad

Hi Richard,


On 09/19/2017 10:25 PM, Richard Cochran wrote:
(...)
>  
>> I have a question, what about a controller that doesn't provide a way to
>> set a per-packet transmission time, but it supports Qbv/Qbu. What would
>> be your proposal to configure it?
> 
> SO_TXTIME will have a generic SW fallback.
> 
> BTW, regarding the i210, there is no sensible way to configure both
> CBS and time based transmission at the same time.  The card performs a
> logical AND to make the launch decision.  The effect of this is that
> each and every packet needs a LaunchTime, and the driver would be
> forced to guess the time for a packet before entering it into its
> queue.
> 
> So if we end up merging CBS and SO_TXTIME, then we'll have to make
> them exclusive of each other (in the case of the i210) and manage the
> i210 queue configurations correctly.
> 

I've ran some quick tests here having launch time enabled on i210 + our cbs
patchset. When valid Launch times are set on each packet you still get the
expected behavior, so I'm not sure we should just make them exclusive of each other.

I also did some tests with when you don't set valid launch times, but here using
your idea from above, so with the driver calculating a valid launch time (i.e.
current NIC time + X ns, varying X across tests) for packets that didn't have it
set by the user, and I wasn't too happy with its reliability. It could
definitely be improved, but it has left me wondering: instead, what about
documenting that if you enable TXTIME, then you *must* provide a valid Launch
time for all packets on traffic classes that are affected?

With the SO_TXTIME qdisc idea in place, that could even be enforced before
packets were enqueued into the netdevice.


Regards,
Jesus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-18 22:37           ` Jesus Sanchez-Palencia
@ 2017-10-19 20:39             ` Richard Cochran
  2017-10-23 17:18               ` Jesus Sanchez-Palencia
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Cochran @ 2017-10-19 20:39 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong,
	Levi Pearson, Henrik Austad

On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote:
> I also did some tests with when you don't set valid launch times, but here using
> your idea from above, so with the driver calculating a valid launch time (i.e.
> current NIC time + X ns, varying X across tests) for packets that didn't have it
> set by the user, and I wasn't too happy with its reliability. It could
> definitely be improved, but it has left me wondering: instead, what about
> documenting that if you enable TXTIME, then you *must* provide a valid Launch
> time for all packets on traffic classes that are affected?

If txtime is enabled, then CBS is pointless because the txtime already
specifies the bandwidth implicitly.

The problem is when one program uses txtime and another uses CBS, then
the CBS user will experience really wrong performance.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-19 20:39             ` Richard Cochran
@ 2017-10-23 17:18               ` Jesus Sanchez-Palencia
  0 siblings, 0 replies; 45+ messages in thread
From: Jesus Sanchez-Palencia @ 2017-10-23 17:18 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Vinicius Costa Gomes, netdev, jhs, xiyou.wangcong, jiri,
	intel-wired-lan, andre.guedes, ivan.briano, boon.leong.ong,
	Levi Pearson, Henrik Austad

Hi,

On 10/19/2017 01:39 PM, Richard Cochran wrote:
> On Wed, Oct 18, 2017 at 03:37:35PM -0700, Jesus Sanchez-Palencia wrote:
>> I also did some tests with when you don't set valid launch times, but here using
>> your idea from above, so with the driver calculating a valid launch time (i.e.
>> current NIC time + X ns, varying X across tests) for packets that didn't have it
>> set by the user, and I wasn't too happy with its reliability. It could
>> definitely be improved, but it has left me wondering: instead, what about
>> documenting that if you enable TXTIME, then you *must* provide a valid Launch
>> time for all packets on traffic classes that are affected?
> 
> If txtime is enabled, then CBS is pointless because the txtime already
> specifies the bandwidth implicitly.


Assuming there is no "interfering" traffic on that traffic class, yes.
Otherwise, CBS could be configured just to avoid that outbound traffic ever goes
beyond the reserved bandwidth.


> 
> The problem is when one program uses txtime and another uses CBS, then
> the CBS user will experience really wrong performance.


Good point. We'll need to adjust the launch time for controllers that behave
like the i210 then, imo.


Thanks,
Jesus


> 
> Thanks,
> Richard
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-02 18:45 ` Levi Pearson
  2017-10-02 19:40   ` Rodney Cummings
@ 2017-10-02 23:06   ` Guedes, Andre
  1 sibling, 0 replies; 45+ messages in thread
From: Guedes, Andre @ 2017-10-02 23:06 UTC (permalink / raw)
  To: levipearson, rodney.cummings
  Cc: Sanchez-Palencia, Jesus, netdev, Gomes, Vinicius, Briano, Ivan,
	richardcochran, henrik

[-- Attachment #1: Type: text/plain, Size: 10530 bytes --]

Hi all,

On Mon, 2017-10-02 at 12:45 -0600, Levi Pearson wrote:
> Hi Rodney,
> 
> Some archives seem to have threaded it, but I have CC'd the
> participants I saw in the original discussion thread since they may
> not otherwise notice it amongst the normal traffic.
> 
> On Fri, Sep 29, 2017 at 2:44 PM, Rodney Cummings <rodney.cummings@ni.com>
> wrote:

[...]

> > 1. Question: From an 802.1 perspective, is this RFC intended to support
> > end-station (e.g. NIC in host), bridges (i.e. DSA), or both?
> > 
> > This is very important to clarify, because the usage of this interface
> > will be very different for one or the other.
> > 
> > For a bridge, the user code typically represents a remote management
> > protocol (e.g. SNMP, NETCONF, RESTCONF), and this interface is
> > expected to align with the specifications of 802.1Q clause 12,
> > which serves as the information model for management. Historically,
> > a standard kernel interface for management hasn't been viewed as
> > essential, but I suppose it wouldn't hurt.
> 
> I don't think the proposal was meant to cover the case of non-local
> switch hardware, but in addition to dsa and switchdev switch ICs
> managed by embedded Linux-running SoCs, there are SoCs with embedded
> small port count switches or even plain multiple NICs with software
> bridging. Many of these embedded small port count switches have FQTSS
> hardware that could potentially be configured by the proposed cbs
> qdisc. This blurs the line somewhat between what is a "bridge" and
> what is an "end-station" in 802.1Q terminology, but nevertheless these
> devices exist, sometimes acting as an endpoint + a real bridge and
> sometimes as just a system with multiple network interfaces.

During the development of this proposal, we were most focused on end-station
use-cases. We considered some bridge use-cases as well just to verify that the
proposed design wouldn't be an issue if someone else goes for it.

We agree that the line between end-station and bridge can be a bit blurred (in
this case). Even though we designed this interface with end-station use-cases
in mind, if the proposed infrastructure could be used as is in bridge use-
cases, good.

> > For an end station, the user code can be an implementation of SRP
> > (802.1Q clause 35), or it can be an application-specific
> > protocol (e.g. industrial fieldbus) that exchanges data according
> > to P802.1Qcc clause 46. Either way, the top-level user interface
> > is designed for individual streams, not queues and shapers. That
> > implies some translation code between that top-level interface
> > and this sort of kernel interface.

Yes, you're right. Our understanding is that the top-level interfaces should be
implemented at user space as well as any stream management functionality. The
idea here is to keep the kernel-side as simple as possible. The kernel handles
hardware configuration (via Traffic Control interface) while the user space
handles TSN streams i.e. the kernel provides the mechanism and the user space
provides the policy.

> > As a specific end-station example, for CBS, 802.1Q-2014 subclause
> > 34.6.1 requires "per-stream queues" in the Talker end-station.
> > I don't see 34.6.1 represented in the proposed RFC, but that's
> > okay... maybe per-stream queues are implemented in user code.
> > Nevertheless, if that is the assumption, I think we need to
> > clarify, especially in examples.
> 
> You're correct that the FQTSS credit-based shaping algorithm requires
> per-stream shaping by Talker endpoints as well, but this is in
> addition to the per-class shaping provided by most hardware shaping
> implementations that I'm aware of in endpoint network hardware. I
> agree that we need to document the need to provide this, but it can
> definitely be built on top of the current proposal.
> 
> I believe the per-stream shaping could be managed either by a user
> space application that manages all use of a streaming traffic class,
> or through an additional qdisc module that performs per-stream
> management on top of the proposed cbs qdisc, ensuring that the
> frames-per-observation interval aspect of each stream's reservation is
> obeyed. This becomes a fairly simple qdisc to implement on top of a
> per-traffic class shaper, and could even be implemented with the help
> of the timestamp that the SO_TXTIME proposal adds to skbuffs, but I
> think keeping the layers separate provides more flexibility to
> implementations and keeps management of various kinds of hardware
> offload support simpler as well.

Indeed, 'per-stream queue' is not covered in this RFC. For now, we expect it to
be implemented in user code. We believe the proposed CBS qdisc could be
extended to support a full software-based implementation which would be used to
implement 'per-stream queue' support. This functionality should be addressed by
a separated series.

Anyways, we're about to send the v3 patchset implementing this proposal and
we'll make it clear.

> > 2. Suggestion: Do not assume that a time-aware (i.e. scheduled)
> > end-station will always use 802.1Qbv.
> > 
> > For those who are subscribed to the 802.1 mailing list,
> > I'd suggest a read of draft P802.1Qcc/D1.6, subclause U.1
> > of Annex U. Subclause U.1 assumes that bridges in the network use
> > 802.1Qbv, and then it poses the question of what an end-station
> > Talker should do. If the end-station also uses 802.1Qbv,
> > and that end-station transmits multiple streams, 802.1Qbv is
> > a bad implementation. The reason is that the scheduling
> > (i.e. order in time) of each stream cannot be controlled, which
> > in turn means that the CNC (network manager) cannot optimize
> > the 802.1Qbv schedules in bridges. The preferred technique
> > is to use "per-stream scheduling" in each Talker, so that
> > the CNC can create an optimal schedules (i.e. best determinism).
> > 
> > I'm aware of a small number of proprietary CNC implementations for
> > 802.1Qbv in bridges, and they are generally assuming per-stream
> > scheduling in end-stations (Talkers).
> > 
> > The i210 NIC's LaunchTime can be used to implement per-stream
> > scheduling. I haven't looked at SO_TXTIME in detail, but it sounds
> > like per-stream scheduling. If so, then we already have the
> > fundamental building blocks for a complete implementation
> > of a time-aware end-station.
> > 
> > If we answer the preceding question #1 as "end-station only",
> > I would recommend avoiding 802.1Qbv in this interface. There
> > isn't really anything wrong with it per-se, but it would lead
> > developers down the wrong path.
> 
> In some situations, such as device nodes that each incorporate a small
> port count switch for the purpose of daisy-chaining a segment of the
> network, "end stations" must do a limited subset of local bridge
> management as well. I'm not sure how common this is going to be for
> industrial control applications, but I know there are audio and
> automotive applications built this way.
> 
> One particular device I am working with now provides all network
> access through a DSA switch chip with hardware Qbv support in addtion
> to hardware Qav support. The SoC attached to it has no hardware timed
> launch (SO_TXTIME) support. In this case, although the proposed
> interface for Qbv is not *sufficient* to make a working time-aware end
> station, it does provide a usable building block to provide one. As
> with the credit-based shaping system, Talkers must provide an
> additional level of per-stream shaping as well, but this is largely
> (absent the jitter calculations, which are sort of a middle-level
> concern) independent of what sort of hardware offload of the
> scheduling is provided.
> 
> Both Qbv windows and timed launch support do roughly the same thing;
> they *delay* the launch of a hardware-queued frame so it can egress at
> a precisely specified time, and at least with the i210 and Qbv, ensure
> that no other traffic will be in-progress when that time arrives. For
> either to be used effectively, the application still has to prepare
> the frame slightly ahead-of-time and thus must have the same level of
> time-awareness. This is, again, largely independent of what kind of
> hardware offloading support is provided and is also largely
> independent of the network stack itself. Neither queue window
> management nor SO_TXTIME help the application present its
> time-sensitive traffic at the right time; that's a matter to be worked
> out with the application taking advantage of PTP and the OS scheduler.
> Whether you rely on managed windows or hardware launch time to provide
> the precisely correct amount of delay beyond that is immaterial to the
> application. In the absence of SO_TXTIME offloading (or even with it,
> and in the presence of sufficient OS scheduling jitter), an additional
> layer may need to be provided to ensure different applications' frames
> are queued in the correct order for egress during the window. Again,
> this could be a purely user-space application multiplexer or a
> separate qdisc module.
> 
> I wholeheartedly agree with you and Richard that we ought to
> eventually provide application-level APIs that don't require users to
> have deep knowledge of various 802.1Q intricacies. But I believe that
> the hardware offloading capability being provided now, and the variety
> of the way things are hooked up in real hardware, suggests that we
> ought to also build the support for the underlying protocols in layers
> so that we don't create unnecessary mismatches between offloading
> capability (which can be essential to overall network performance) and
> APIs, such that one configuration of offload support is privileged
> above others even when comparable scheduling accuracy could be
> provided by either.
> 
> In any case, only the cbs qdisc has been included in the post-RFC
> patch cover page for its last couple of iterations, so there is plenty
> of time to discuss how time-aware shaping, preemption, etc. management
> should occur beyond the cbs and SO_TXTIME proposals.

Yes, based on the previous feedback about the Qbv offloading interface
('taprio'), we've decided to postpone its proposal until we have NICs
supporting Qbv and more realistic use-cases. The current proposal covers only
FQTSS.

Thanks for your feedback!

Best regards,

Andre

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3262 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-02 21:48     ` Levi Pearson
@ 2017-10-02 22:52       ` Rodney Cummings
  0 siblings, 0 replies; 45+ messages in thread
From: Rodney Cummings @ 2017-10-02 22:52 UTC (permalink / raw)
  To: Levi Pearson
  Cc: Linux Kernel Network Developers, Vinicius Costa Gomes,
	Henrik Austad, richardcochran, jesus.sanchez-palencia,
	andre.guedes

Thanks Levi,

Great discussion. It sounds like we're aligned.

Qbv is essential in switches. My concern was the end-station side only.

You're absolutely right that i210 LaunchTime doesn't do everything that is needed 
for scheduling. It still requires quite a bit of frame-juggling to ensure that
transmit descriptors are ordered by time, and that the control system's cyclic
traffic is repeated as expected. Like you, I'm not sure where that code is best
located in the long run.

For a hardware design that connects an FPGA to one of the internal ports of a
Qbv-capable switch, the preferable scheduling design can be done there.
The FPGA IP essentially implements a cyclic schedule of transmit descriptors,
each with its own buffer that can be written atomically by application code.
For a control system (i.e. industrial field-level or automotive control),
each buffer (stream) can hold a single frame. If you take a look at the
"Network Interface" specs for FlexRay in automotive, they went so far as to
mandate that design for a NIC. The IP for that sort of thing
doesn't take alot of gates or memory, so I can see why FlexRay went that way.

That sort of design wasn't destined to happen as part of Qbv, because the switch
vendors are worried about things like 64-port switches. Also, if we treat Qbv
as applying to switches only, FlexRay-style IP isn't needed. For the switch,
we only need it to "get out of the way", and Qbv is fine for that.

I may be naive and idealistic, but in the long run, I'm still hopeful that an
Ethernet NIC vendor will implement Flexray-style scheduling for end-station. 
We need a NIC trend-setter to do it as a differentiator, and the rest will 
probably follow along. That completely eliminates code in Linux for the 
frame-juggling. We'll need to provide a way to open a socket for a specific 
stream, but that doesn't seem too challenging.

Rodney




> -----Original Message-----
> From: Levi Pearson [mailto:levipearson@gmail.com]
> Sent: Monday, October 2, 2017 4:49 PM
> To: Rodney Cummings <rodney.cummings@ni.com>
> Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Vinicius
> Costa Gomes <vinicius.gomes@intel.com>; Henrik Austad <henrik@austad.us>;
> richardcochran@gmail.com; jesus.sanchez-palencia@intel.com;
> andre.guedes@intel.com
> Subject: Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for
> traffic shapers
> 
> Hi Rodney,
> 
> On Mon, Oct 2, 2017 at 1:40 PM, Rodney Cummings <rodney.cummings@ni.com>
> wrote:
> 
> > It's a shame that someone built such hardware. Speaking as a
> manufacturer
> > of daisy-chainable products for industrial/automotive applications, I
> > wouldn't use that hardware in my products. The whole point of scheduling
> > is to obtain close-to-optimal network determinism. If the hardware
> > doesn't schedule properly, the product is probably better off using CBS.
> 
> I should note that the hardware I'm using was not designed to be a TSN
> endpoint. It just happens that the hardware, designed for other
> reasons, happens to have hardware support for Qbv and thus makes it an
> interesting and inexpensive platform on which to do Qbv experiments.
> Nevertheless, I believe the right combination of driver support for
> the hardware shaping and appropriate software in the network stack
> would allow it to schedule things fairly precisely in the
> application-centric manner you've described. I will ultimately defer
> to your judgement on that, though, and spend some time with the Qcc
> draft to see where I may be wrong.
> 
> > It would be great if all of the user-level application code was
> scheduled
> > as tightly as the network, but the reality is that we aren't there yet.
> > Also, even if the end-station has a separately timed RT Linux thread for
> > each stream, the order of those threads can vary. Therefore, when the
> > end-station (SoC) has more than one talker, the frames for each talker
> > can be written into the Qbv queue at any time, in any order.
> > There is no scheduling of frames (i.e. streams)... only the queue.
> 
> Frames are presented to *the kernel* in any order. The qdisc
> infrastructure provides mechanisms by which other scheduling policy
> can be added between presentation to the kernel and presentation to
> hardware queues.
> 
> Even with i210-style LaunchTime functionality, this must also be
> provided. The hardware cannot re-order time-scheduled frames, and
> reviewers of the SO_TXTIME proposal rejected the idea of having the
> driver try to re-order scheduled frames already submitted to the
> hardware-visible descriptor chains, and rightly so I think.
> 
> So far, I have thought of 3 places this ordering constraint might be
> provided:
> 
> 1. Built into each driver with offload support for SO_TXTIME
> 2. A qdisc module that will re-order scheduled skbuffs within a certain
> window
> 3. Left to userspace to coordinate
> 
> I think 1 is particularly bad due to duplication of code. 3 is the
> easy default from the point of view of the kernel, but not attractive
> from the point of view of precise scheduling and low latency to
> egress. Unless someone has a better idea, I think 2 would be the most
> appropriate choice. I can think of a number of ways one might be
> designed, but I have not thought them through fully yet.
> 
> > From the perspective of the CNC, that means that a Qbv-only end-station
> is
> > sloppy. In contrast, an end-station using per-stream scheduling
> > (e.g. i210 with LaunchTime) is precise, because each frame has a known
> time.
> > It's true that P802.1Qcc supports both, so the CNC might support both.
> > Nevertheless, the CNC is likely to compute a network-wide schedule
> > that optimizes for the per-stream end-stations, and locates the windows
> > for the Qbv-only end-stations in a slower interval.
> 
> Because out-of-order submission of frames to LaunchTime hardware would
> result in the frame that is enqueued later but scheduled earlier only
> transmitting once the later-scheduled frame completes, enqueuing
> frames immediately with LaunchTime information results in even *worse*
> sloppiness if two applications don't always submit their frames in the
> correct order to the kernel during a cycle. If your queue is simply
> gated at the time of the first scheduled launch, the frame that should
> have been first will only be late by the length of the queued-first
> frame. If they both have precise launch times, the frame that should
> have been first will have to wait until the normal launch time of the
> later-scheduled frame plus the time it takes for it to egress!
> 
> In either case, re-ordering is required if it is not controlled for in
> userspace. In other words, I am not disagreeing with you about the
> insufficiency of a Qbv shaper for an endpoint! I am just pointing out
> that neither hardware LaunchTime support nor hardware Qbv window
> support are sufficient by themselves. Nor, for that case, is the CBS
> offload support sufficient to meet multi-stream requirements from SRP.
> All of these provide a low-layer mechanism by which appropriate
> scheduling of a multi-stream or multi-application endpoint can be
> enhanced, but none of them provide the complete story by themselves.
> 
> > I realize that we are only doing CBS for now. I guess my main point is
> that
> > if/when we add scheduling, it cannot be limited to Qbv. Per-stream
> scheduling
> > is essential, because that is the assumption for most CNC designs that
> > I'm aware of.
> 
> I understand and completely agree. I have believed that per-stream
> shaping/scheduling in some form is essential from the beginning.
> Likewise, CBS is not sufficient in itself for multi-stream systems in
> which multiple applications provide streams scheduled for the same
> traffic class. But CBS and SO_TXTIME are good first steps; they
> provides hooks for drivers to finally support the mechanisms provided
> by hardware, and also in the case of SO_TXTIME a hook on which further
> time-sensitive enhancements to the network stack can be hung, such as
> qdiscs that can store and release frames based on scheduled launch
> times. When these are in place, I think stream-based scheduling via
> qdisc would be a reasonable next step; perhaps a common design could
> be found to work both for media streams over CBS and industrial
> streams scheduled by launch time.
> 
> 
> Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-02 19:40   ` Rodney Cummings
@ 2017-10-02 21:48     ` Levi Pearson
  2017-10-02 22:52       ` Rodney Cummings
  0 siblings, 1 reply; 45+ messages in thread
From: Levi Pearson @ 2017-10-02 21:48 UTC (permalink / raw)
  To: Rodney Cummings
  Cc: Linux Kernel Network Developers, Vinicius Costa Gomes,
	Henrik Austad, richardcochran, jesus.sanchez-palencia,
	andre.guedes

Hi Rodney,

On Mon, Oct 2, 2017 at 1:40 PM, Rodney Cummings <rodney.cummings@ni.com> wrote:

> It's a shame that someone built such hardware. Speaking as a manufacturer
> of daisy-chainable products for industrial/automotive applications, I
> wouldn't use that hardware in my products. The whole point of scheduling
> is to obtain close-to-optimal network determinism. If the hardware
> doesn't schedule properly, the product is probably better off using CBS.

I should note that the hardware I'm using was not designed to be a TSN
endpoint. It just happens that the hardware, designed for other
reasons, happens to have hardware support for Qbv and thus makes it an
interesting and inexpensive platform on which to do Qbv experiments.
Nevertheless, I believe the right combination of driver support for
the hardware shaping and appropriate software in the network stack
would allow it to schedule things fairly precisely in the
application-centric manner you've described. I will ultimately defer
to your judgement on that, though, and spend some time with the Qcc
draft to see where I may be wrong.

> It would be great if all of the user-level application code was scheduled
> as tightly as the network, but the reality is that we aren't there yet.
> Also, even if the end-station has a separately timed RT Linux thread for
> each stream, the order of those threads can vary. Therefore, when the
> end-station (SoC) has more than one talker, the frames for each talker
> can be written into the Qbv queue at any time, in any order.
> There is no scheduling of frames (i.e. streams)... only the queue.

Frames are presented to *the kernel* in any order. The qdisc
infrastructure provides mechanisms by which other scheduling policy
can be added between presentation to the kernel and presentation to
hardware queues.

Even with i210-style LaunchTime functionality, this must also be
provided. The hardware cannot re-order time-scheduled frames, and
reviewers of the SO_TXTIME proposal rejected the idea of having the
driver try to re-order scheduled frames already submitted to the
hardware-visible descriptor chains, and rightly so I think.

So far, I have thought of 3 places this ordering constraint might be provided:

1. Built into each driver with offload support for SO_TXTIME
2. A qdisc module that will re-order scheduled skbuffs within a certain window
3. Left to userspace to coordinate

I think 1 is particularly bad due to duplication of code. 3 is the
easy default from the point of view of the kernel, but not attractive
from the point of view of precise scheduling and low latency to
egress. Unless someone has a better idea, I think 2 would be the most
appropriate choice. I can think of a number of ways one might be
designed, but I have not thought them through fully yet.

> From the perspective of the CNC, that means that a Qbv-only end-station is
> sloppy. In contrast, an end-station using per-stream scheduling
> (e.g. i210 with LaunchTime) is precise, because each frame has a known time.
> It's true that P802.1Qcc supports both, so the CNC might support both.
> Nevertheless, the CNC is likely to compute a network-wide schedule
> that optimizes for the per-stream end-stations, and locates the windows
> for the Qbv-only end-stations in a slower interval.

Because out-of-order submission of frames to LaunchTime hardware would
result in the frame that is enqueued later but scheduled earlier only
transmitting once the later-scheduled frame completes, enqueuing
frames immediately with LaunchTime information results in even *worse*
sloppiness if two applications don't always submit their frames in the
correct order to the kernel during a cycle. If your queue is simply
gated at the time of the first scheduled launch, the frame that should
have been first will only be late by the length of the queued-first
frame. If they both have precise launch times, the frame that should
have been first will have to wait until the normal launch time of the
later-scheduled frame plus the time it takes for it to egress!

In either case, re-ordering is required if it is not controlled for in
userspace. In other words, I am not disagreeing with you about the
insufficiency of a Qbv shaper for an endpoint! I am just pointing out
that neither hardware LaunchTime support nor hardware Qbv window
support are sufficient by themselves. Nor, for that case, is the CBS
offload support sufficient to meet multi-stream requirements from SRP.
All of these provide a low-layer mechanism by which appropriate
scheduling of a multi-stream or multi-application endpoint can be
enhanced, but none of them provide the complete story by themselves.

> I realize that we are only doing CBS for now. I guess my main point is that
> if/when we add scheduling, it cannot be limited to Qbv. Per-stream scheduling
> is essential, because that is the assumption for most CNC designs that
> I'm aware of.

I understand and completely agree. I have believed that per-stream
shaping/scheduling in some form is essential from the beginning.
Likewise, CBS is not sufficient in itself for multi-stream systems in
which multiple applications provide streams scheduled for the same
traffic class. But CBS and SO_TXTIME are good first steps; they
provides hooks for drivers to finally support the mechanisms provided
by hardware, and also in the case of SO_TXTIME a hook on which further
time-sensitive enhancements to the network stack can be hung, such as
qdiscs that can store and release frames based on scheduled launch
times. When these are in place, I think stream-based scheduling via
qdisc would be a reasonable next step; perhaps a common design could
be found to work both for media streams over CBS and industrial
streams scheduled by launch time.


Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-10-02 18:45 ` Levi Pearson
@ 2017-10-02 19:40   ` Rodney Cummings
  2017-10-02 21:48     ` Levi Pearson
  2017-10-02 23:06   ` Guedes, Andre
  1 sibling, 1 reply; 45+ messages in thread
From: Rodney Cummings @ 2017-10-02 19:40 UTC (permalink / raw)
  To: Levi Pearson
  Cc: Linux Kernel Network Developers, Vinicius Costa Gomes,
	Henrik Austad, richardcochran, jesus.sanchez-palencia,
	andre.guedes

Thanks Levi,

> One particular device I am working with now provides all network
> access through a DSA switch chip with hardware Qbv support in addtion
> to hardware Qav support. The SoC attached to it has no hardware timed
> launch (SO_TXTIME) support. In this case, although the proposed
> interface for Qbv is not *sufficient* to make a working time-aware end
> station, it does provide a usable building block to provide one. As
> with the credit-based shaping system, Talkers must provide an
> additional level of per-stream shaping as well, but this is largely
> (absent the jitter calculations, which are sort of a middle-level
> concern) independent of what sort of hardware offload of the
> scheduling is provided.

It's a shame that someone built such hardware. Speaking as a manufacturer
of daisy-chainable products for industrial/automotive applications, I
wouldn't use that hardware in my products. The whole point of scheduling
is to obtain close-to-optimal network determinism. If the hardware
doesn't schedule properly, the product is probably better off using CBS.

It would be great if all of the user-level application code was scheduled
as tightly as the network, but the reality is that we aren't there yet.
Also, even if the end-station has a separately timed RT Linux thread for
each stream, the order of those threads can vary. Therefore, when the 
end-station (SoC) has more than one talker, the frames for each talker 
can be written into the Qbv queue at any time, in any order. 
There is no scheduling of frames (i.e. streams)... only the queue.

From the perspective of the CNC, that means that a Qbv-only end-station is
sloppy. In contrast, an end-station using per-stream scheduling 
(e.g. i210 with LaunchTime) is precise, because each frame has a known time.
It's true that P802.1Qcc supports both, so the CNC might support both.
Nevertheless, the CNC is likely to compute a network-wide schedule
that optimizes for the per-stream end-stations, and locates the windows
for the Qbv-only end-stations in a slower interval.

I realize that we are only doing CBS for now. I guess my main point is that
if/when we add scheduling, it cannot be limited to Qbv. Per-stream scheduling
is essential, because that is the assumption for most CNC designs that 
I'm aware of.

Rodney

> -----Original Message-----
> From: Levi Pearson [mailto:levipearson@gmail.com]
> Sent: Monday, October 2, 2017 1:46 PM
> To: Rodney Cummings <rodney.cummings@ni.com>
> Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>; Vinicius
> Costa Gomes <vinicius.gomes@intel.com>; Henrik Austad <henrik@austad.us>;
> richardcochran@gmail.com; jesus.sanchez-palencia@intel.com;
> andre.guedes@intel.com
> Subject: Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for
> traffic shapers
> 
> Hi Rodney,
> 
> Some archives seem to have threaded it, but I have CC'd the
> participants I saw in the original discussion thread since they may
> not otherwise notice it amongst the normal traffic.
> 
> On Fri, Sep 29, 2017 at 2:44 PM, Rodney Cummings <rodney.cummings@ni.com>
> wrote:
> > Hi,
> >
> > I am posting my reply to this thread after subscribing, so I apologize
> > if the archive happens to attach it to the wrong thread.
> >
> > First, I'd like to say that I strongly support this RFC.
> > We need Linux interfaces for IEEE 802.1 TSN features.
> >
> > Although I haven't looked in detail, the proposal for CBS looks good.
> > My questions/concerns are more related to future work, such for 802.1Qbv
> > (scheduled traffic).
> >
> > 1. Question: From an 802.1 perspective, is this RFC intended to support
> > end-station (e.g. NIC in host), bridges (i.e. DSA), or both?
> >
> > This is very important to clarify, because the usage of this interface
> > will be very different for one or the other.
> >
> > For a bridge, the user code typically represents a remote management
> > protocol (e.g. SNMP, NETCONF, RESTCONF), and this interface is
> > expected to align with the specifications of 802.1Q clause 12,
> > which serves as the information model for management. Historically,
> > a standard kernel interface for management hasn't been viewed as
> > essential, but I suppose it wouldn't hurt.
> 
> I don't think the proposal was meant to cover the case of non-local
> switch hardware, but in addition to dsa and switchdev switch ICs
> managed by embedded Linux-running SoCs, there are SoCs with embedded
> small port count switches or even plain multiple NICs with software
> bridging. Many of these embedded small port count switches have FQTSS
> hardware that could potentially be configured by the proposed cbs
> qdisc. This blurs the line somewhat between what is a "bridge" and
> what is an "end-station" in 802.1Q terminology, but nevertheless these
> devices exist, sometimes acting as an endpoint + a real bridge and
> sometimes as just a system with multiple network interfaces.
> 
> > For an end station, the user code can be an implementation of SRP
> > (802.1Q clause 35), or it can be an application-specific
> > protocol (e.g. industrial fieldbus) that exchanges data according
> > to P802.1Qcc clause 46. Either way, the top-level user interface
> > is designed for individual streams, not queues and shapers. That
> > implies some translation code between that top-level interface
> > and this sort of kernel interface.
> >
> > As a specific end-station example, for CBS, 802.1Q-2014 subclause
> > 34.6.1 requires "per-stream queues" in the Talker end-station.
> > I don't see 34.6.1 represented in the proposed RFC, but that's
> > okay... maybe per-stream queues are implemented in user code.
> > Nevertheless, if that is the assumption, I think we need to
> > clarify, especially in examples.
> 
> You're correct that the FQTSS credit-based shaping algorithm requires
> per-stream shaping by Talker endpoints as well, but this is in
> addition to the per-class shaping provided by most hardware shaping
> implementations that I'm aware of in endpoint network hardware. I
> agree that we need to document the need to provide this, but it can
> definitely be built on top of the current proposal.
> 
> I believe the per-stream shaping could be managed either by a user
> space application that manages all use of a streaming traffic class,
> or through an additional qdisc module that performs per-stream
> management on top of the proposed cbs qdisc, ensuring that the
> frames-per-observation interval aspect of each stream's reservation is
> obeyed. This becomes a fairly simple qdisc to implement on top of a
> per-traffic class shaper, and could even be implemented with the help
> of the timestamp that the SO_TXTIME proposal adds to skbuffs, but I
> think keeping the layers separate provides more flexibility to
> implementations and keeps management of various kinds of hardware
> offload support simpler as well.
> 
> > 2. Suggestion: Do not assume that a time-aware (i.e. scheduled)
> > end-station will always use 802.1Qbv.
> >
> > For those who are subscribed to the 802.1 mailing list,
> > I'd suggest a read of draft P802.1Qcc/D1.6, subclause U.1
> > of Annex U. Subclause U.1 assumes that bridges in the network use
> > 802.1Qbv, and then it poses the question of what an end-station
> > Talker should do. If the end-station also uses 802.1Qbv,
> > and that end-station transmits multiple streams, 802.1Qbv is
> > a bad implementation. The reason is that the scheduling
> > (i.e. order in time) of each stream cannot be controlled, which
> > in turn means that the CNC (network manager) cannot optimize
> > the 802.1Qbv schedules in bridges. The preferred technique
> > is to use "per-stream scheduling" in each Talker, so that
> > the CNC can create an optimal schedules (i.e. best determinism).
> >
> > I'm aware of a small number of proprietary CNC implementations for
> > 802.1Qbv in bridges, and they are generally assuming per-stream
> > scheduling in end-stations (Talkers).
> >
> > The i210 NIC's LaunchTime can be used to implement per-stream
> > scheduling. I haven't looked at SO_TXTIME in detail, but it sounds
> > like per-stream scheduling. If so, then we already have the
> > fundamental building blocks for a complete implementation
> > of a time-aware end-station.
> >
> > If we answer the preceding question #1 as "end-station only",
> > I would recommend avoiding 802.1Qbv in this interface. There
> > isn't really anything wrong with it per-se, but it would lead
> > developers down the wrong path.
> 
> In some situations, such as device nodes that each incorporate a small
> port count switch for the purpose of daisy-chaining a segment of the
> network, "end stations" must do a limited subset of local bridge
> management as well. I'm not sure how common this is going to be for
> industrial control applications, but I know there are audio and
> automotive applications built this way.
> 
> One particular device I am working with now provides all network
> access through a DSA switch chip with hardware Qbv support in addtion
> to hardware Qav support. The SoC attached to it has no hardware timed
> launch (SO_TXTIME) support. In this case, although the proposed
> interface for Qbv is not *sufficient* to make a working time-aware end
> station, it does provide a usable building block to provide one. As
> with the credit-based shaping system, Talkers must provide an
> additional level of per-stream shaping as well, but this is largely
> (absent the jitter calculations, which are sort of a middle-level
> concern) independent of what sort of hardware offload of the
> scheduling is provided.
> 
> Both Qbv windows and timed launch support do roughly the same thing;
> they *delay* the launch of a hardware-queued frame so it can egress at
> a precisely specified time, and at least with the i210 and Qbv, ensure
> that no other traffic will be in-progress when that time arrives. For
> either to be used effectively, the application still has to prepare
> the frame slightly ahead-of-time and thus must have the same level of
> time-awareness. This is, again, largely independent of what kind of
> hardware offloading support is provided and is also largely
> independent of the network stack itself. Neither queue window
> management nor SO_TXTIME help the application present its
> time-sensitive traffic at the right time; that's a matter to be worked
> out with the application taking advantage of PTP and the OS scheduler.
> Whether you rely on managed windows or hardware launch time to provide
> the precisely correct amount of delay beyond that is immaterial to the
> application. In the absence of SO_TXTIME offloading (or even with it,
> and in the presence of sufficient OS scheduling jitter), an additional
> layer may need to be provided to ensure different applications' frames
> are queued in the correct order for egress during the window. Again,
> this could be a purely user-space application multiplexer or a
> separate qdisc module.
> 
> I wholeheartedly agree with you and Richard that we ought to
> eventually provide application-level APIs that don't require users to
> have deep knowledge of various 802.1Q intricacies. But I believe that
> the hardware offloading capability being provided now, and the variety
> of the way things are hooked up in real hardware, suggests that we
> ought to also build the support for the underlying protocols in layers
> so that we don't create unnecessary mismatches between offloading
> capability (which can be essential to overall network performance) and
> APIs, such that one configuration of offload support is privileged
> above others even when comparable scheduling accuracy could be
> provided by either.
> 
> In any case, only the cbs qdisc has been included in the post-RFC
> patch cover page for its last couple of iterations, so there is plenty
> of time to discuss how time-aware shaping, preemption, etc. management
> should occur beyond the cbs and SO_TXTIME proposals.
> 
> 
> Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
  2017-09-29 20:44 Rodney Cummings
@ 2017-10-02 18:45 ` Levi Pearson
  2017-10-02 19:40   ` Rodney Cummings
  2017-10-02 23:06   ` Guedes, Andre
  0 siblings, 2 replies; 45+ messages in thread
From: Levi Pearson @ 2017-10-02 18:45 UTC (permalink / raw)
  To: Rodney Cummings
  Cc: Linux Kernel Network Developers, Vinicius Costa Gomes,
	Henrik Austad, richardcochran, jesus.sanchez-palencia,
	andre.guedes

Hi Rodney,

Some archives seem to have threaded it, but I have CC'd the
participants I saw in the original discussion thread since they may
not otherwise notice it amongst the normal traffic.

On Fri, Sep 29, 2017 at 2:44 PM, Rodney Cummings <rodney.cummings@ni.com> wrote:
> Hi,
>
> I am posting my reply to this thread after subscribing, so I apologize
> if the archive happens to attach it to the wrong thread.
>
> First, I'd like to say that I strongly support this RFC.
> We need Linux interfaces for IEEE 802.1 TSN features.
>
> Although I haven't looked in detail, the proposal for CBS looks good.
> My questions/concerns are more related to future work, such for 802.1Qbv
> (scheduled traffic).
>
> 1. Question: From an 802.1 perspective, is this RFC intended to support
> end-station (e.g. NIC in host), bridges (i.e. DSA), or both?
>
> This is very important to clarify, because the usage of this interface
> will be very different for one or the other.
>
> For a bridge, the user code typically represents a remote management
> protocol (e.g. SNMP, NETCONF, RESTCONF), and this interface is
> expected to align with the specifications of 802.1Q clause 12,
> which serves as the information model for management. Historically,
> a standard kernel interface for management hasn't been viewed as
> essential, but I suppose it wouldn't hurt.

I don't think the proposal was meant to cover the case of non-local
switch hardware, but in addition to dsa and switchdev switch ICs
managed by embedded Linux-running SoCs, there are SoCs with embedded
small port count switches or even plain multiple NICs with software
bridging. Many of these embedded small port count switches have FQTSS
hardware that could potentially be configured by the proposed cbs
qdisc. This blurs the line somewhat between what is a "bridge" and
what is an "end-station" in 802.1Q terminology, but nevertheless these
devices exist, sometimes acting as an endpoint + a real bridge and
sometimes as just a system with multiple network interfaces.

> For an end station, the user code can be an implementation of SRP
> (802.1Q clause 35), or it can be an application-specific
> protocol (e.g. industrial fieldbus) that exchanges data according
> to P802.1Qcc clause 46. Either way, the top-level user interface
> is designed for individual streams, not queues and shapers. That
> implies some translation code between that top-level interface
> and this sort of kernel interface.
>
> As a specific end-station example, for CBS, 802.1Q-2014 subclause
> 34.6.1 requires "per-stream queues" in the Talker end-station.
> I don't see 34.6.1 represented in the proposed RFC, but that's
> okay... maybe per-stream queues are implemented in user code.
> Nevertheless, if that is the assumption, I think we need to
> clarify, especially in examples.

You're correct that the FQTSS credit-based shaping algorithm requires
per-stream shaping by Talker endpoints as well, but this is in
addition to the per-class shaping provided by most hardware shaping
implementations that I'm aware of in endpoint network hardware. I
agree that we need to document the need to provide this, but it can
definitely be built on top of the current proposal.

I believe the per-stream shaping could be managed either by a user
space application that manages all use of a streaming traffic class,
or through an additional qdisc module that performs per-stream
management on top of the proposed cbs qdisc, ensuring that the
frames-per-observation interval aspect of each stream's reservation is
obeyed. This becomes a fairly simple qdisc to implement on top of a
per-traffic class shaper, and could even be implemented with the help
of the timestamp that the SO_TXTIME proposal adds to skbuffs, but I
think keeping the layers separate provides more flexibility to
implementations and keeps management of various kinds of hardware
offload support simpler as well.

> 2. Suggestion: Do not assume that a time-aware (i.e. scheduled)
> end-station will always use 802.1Qbv.
>
> For those who are subscribed to the 802.1 mailing list,
> I'd suggest a read of draft P802.1Qcc/D1.6, subclause U.1
> of Annex U. Subclause U.1 assumes that bridges in the network use
> 802.1Qbv, and then it poses the question of what an end-station
> Talker should do. If the end-station also uses 802.1Qbv,
> and that end-station transmits multiple streams, 802.1Qbv is
> a bad implementation. The reason is that the scheduling
> (i.e. order in time) of each stream cannot be controlled, which
> in turn means that the CNC (network manager) cannot optimize
> the 802.1Qbv schedules in bridges. The preferred technique
> is to use "per-stream scheduling" in each Talker, so that
> the CNC can create an optimal schedules (i.e. best determinism).
>
> I'm aware of a small number of proprietary CNC implementations for
> 802.1Qbv in bridges, and they are generally assuming per-stream
> scheduling in end-stations (Talkers).
>
> The i210 NIC's LaunchTime can be used to implement per-stream
> scheduling. I haven't looked at SO_TXTIME in detail, but it sounds
> like per-stream scheduling. If so, then we already have the
> fundamental building blocks for a complete implementation
> of a time-aware end-station.
>
> If we answer the preceding question #1 as "end-station only",
> I would recommend avoiding 802.1Qbv in this interface. There
> isn't really anything wrong with it per-se, but it would lead
> developers down the wrong path.

In some situations, such as device nodes that each incorporate a small
port count switch for the purpose of daisy-chaining a segment of the
network, "end stations" must do a limited subset of local bridge
management as well. I'm not sure how common this is going to be for
industrial control applications, but I know there are audio and
automotive applications built this way.

One particular device I am working with now provides all network
access through a DSA switch chip with hardware Qbv support in addtion
to hardware Qav support. The SoC attached to it has no hardware timed
launch (SO_TXTIME) support. In this case, although the proposed
interface for Qbv is not *sufficient* to make a working time-aware end
station, it does provide a usable building block to provide one. As
with the credit-based shaping system, Talkers must provide an
additional level of per-stream shaping as well, but this is largely
(absent the jitter calculations, which are sort of a middle-level
concern) independent of what sort of hardware offload of the
scheduling is provided.

Both Qbv windows and timed launch support do roughly the same thing;
they *delay* the launch of a hardware-queued frame so it can egress at
a precisely specified time, and at least with the i210 and Qbv, ensure
that no other traffic will be in-progress when that time arrives. For
either to be used effectively, the application still has to prepare
the frame slightly ahead-of-time and thus must have the same level of
time-awareness. This is, again, largely independent of what kind of
hardware offloading support is provided and is also largely
independent of the network stack itself. Neither queue window
management nor SO_TXTIME help the application present its
time-sensitive traffic at the right time; that's a matter to be worked
out with the application taking advantage of PTP and the OS scheduler.
Whether you rely on managed windows or hardware launch time to provide
the precisely correct amount of delay beyond that is immaterial to the
application. In the absence of SO_TXTIME offloading (or even with it,
and in the presence of sufficient OS scheduling jitter), an additional
layer may need to be provided to ensure different applications' frames
are queued in the correct order for egress during the window. Again,
this could be a purely user-space application multiplexer or a
separate qdisc module.

I wholeheartedly agree with you and Richard that we ought to
eventually provide application-level APIs that don't require users to
have deep knowledge of various 802.1Q intricacies. But I believe that
the hardware offloading capability being provided now, and the variety
of the way things are hooked up in real hardware, suggests that we
ought to also build the support for the underlying protocols in layers
so that we don't create unnecessary mismatches between offloading
capability (which can be essential to overall network performance) and
APIs, such that one configuration of offload support is privileged
above others even when comparable scheduling accuracy could be
provided by either.

In any case, only the cbs qdisc has been included in the post-RFC
patch cover page for its last couple of iterations, so there is plenty
of time to discuss how time-aware shaping, preemption, etc. management
should occur beyond the cbs and SO_TXTIME proposals.


Levi

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers
@ 2017-09-29 20:44 Rodney Cummings
  2017-10-02 18:45 ` Levi Pearson
  0 siblings, 1 reply; 45+ messages in thread
From: Rodney Cummings @ 2017-09-29 20:44 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hi,

I am posting my reply to this thread after subscribing, so I apologize
if the archive happens to attach it to the wrong thread.

First, I'd like to say that I strongly support this RFC. 
We need Linux interfaces for IEEE 802.1 TSN features.

Although I haven't looked in detail, the proposal for CBS looks good. 
My questions/concerns are more related to future work, such for 802.1Qbv
(scheduled traffic).

1. Question: From an 802.1 perspective, is this RFC intended to support 
end-station (e.g. NIC in host), bridges (i.e. DSA), or both?

This is very important to clarify, because the usage of this interface
will be very different for one or the other. 

For a bridge, the user code typically represents a remote management 
protocol (e.g. SNMP, NETCONF, RESTCONF), and this interface is 
expected to align with the specifications of 802.1Q clause 12, 
which serves as the information model for management. Historically,
a standard kernel interface for management hasn't been viewed as
essential, but I suppose it wouldn't hurt.

For an end station, the user code can be an implementation of SRP
(802.1Q clause 35), or it can be an application-specific 
protocol (e.g. industrial fieldbus) that exchanges data according
to P802.1Qcc clause 46. Either way, the top-level user interface
is designed for individual streams, not queues and shapers. That
implies some translation code between that top-level interface 
and this sort of kernel interface.

As a specific end-station example, for CBS, 802.1Q-2014 subclause 
34.6.1 requires "per-stream queues" in the Talker end-station.
I don't see 34.6.1 represented in the proposed RFC, but that's
okay... maybe per-stream queues are implemented in user code.
Nevertheless, if that is the assumption, I think we need to 
clarify, especially in examples.

2. Suggestion: Do not assume that a time-aware (i.e. scheduled) 
end-station will always use 802.1Qbv.

For those who are subscribed to the 802.1 mailing list, 
I'd suggest a read of draft P802.1Qcc/D1.6, subclause U.1 
of Annex U. Subclause U.1 assumes that bridges in the network use 
802.1Qbv, and then it poses the question of what an end-station
Talker should do. If the end-station also uses 802.1Qbv, 
and that end-station transmits multiple streams, 802.1Qbv is
a bad implementation. The reason is that the scheduling
(i.e. order in time) of each stream cannot be controlled, which
in turn means that the CNC (network manager) cannot optimize
the 802.1Qbv schedules in bridges. The preferred technique
is to use "per-stream scheduling" in each Talker, so that
the CNC can create an optimal schedules (i.e. best determinism).

I'm aware of a small number of proprietary CNC implementations for 
802.1Qbv in bridges, and they are generally assuming per-stream
scheduling in end-stations (Talkers).

The i210 NIC's LaunchTime can be used to implement per-stream 
scheduling. I haven't looked at SO_TXTIME in detail, but it sounds
like per-stream scheduling. If so, then we already have the
fundamental building blocks for a complete implementation
of a time-aware end-station.

If we answer the preceding question #1 as "end-station only",
I would recommend avoiding 802.1Qbv in this interface. There
isn't really anything wrong with it per-se, but it would lead
developers down the wrong path.

Rodney Cummings (National Instruments)
Editor, IEEE P802.1Qcc

---
Hi,

This patchset is an RFC on a proposal of how the Traffic Control subsystem can
be used to offload the configuration of traffic shapers into network devices
that provide support for them in HW. Our goal here is to start upstreaming
support for features related to the Time-Sensitive Networking (TSN) set of
standards into the kernel.

As part of this work, we've assessed previous public discussions related to TSN
enabling: patches from Henrik Austad (Cisco), the presentation from Eric Mann
at Linux Plumbers 2012, patches from Gangfeng Huang (National Instruments) and
the current state of the OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Please note that the patches provided as part of this RFC are implementing what
is needed only for 802.1Qav (FQTSS) only, but we'd like to take advantage of
this discussion and share our WIP ideas for the 802.1Qbv and 802.1Qbu interfaces
as well. The current patches are only providing support for HW offload of the
configs.


Overview
========

Time-sensitive Networking (TSN) is a set of standards that aim to address
resources availability for providing bandwidth reservation and bounded latency
on Ethernet based LANs. The proposal described here aims to cover mainly what is
needed to enable the following standards: 802.1Qat, 802.1Qav, 802.1Qbv and
802.1Qbu.

The initial target of this work is the Intel i210 NIC, but other controllers'
datasheet were also taken into account, like the Renesas RZ/A1H RZ/A1M group and
the Synopsis DesignWare Ethernet QoS controller.


Proposal
========

Feature-wise, what is covered here are configuration interfaces for HW
implementations of the Credit-Based shaper (CBS, 802.1Qav), Time-Aware shaper
(802.1Qbv) and Frame Preemption (802.1Qbu). CBS is a per-queue shaper, while
Qbv and Qbu must be configured per port, with the configuration covering all
queues. Given that these features are related to traffic shaping, and that the
traffic control subsystem already provides a queueing discipline that offloads
config into the device driver (i.e. mqprio), designing new qdiscs for the
specific purpose of offloading the config for each shaper seemed like a good
fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes / Tx queues.
The qdisc mqprio is currently used in our tests.

As for the shapers config interface:

 * CBS (802.1Qav)

   This patchset is proposing a new qdisc called 'cbs'. Its 'tc' cmd line is:
   $ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
     idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.


 * Time-aware shaper (802.1Qbv):

   The idea we are currently exploring is to add a "time-aware", priority based
   qdisc, that also exposes the Tx queues available and provides a mechanism for
   mapping priority <-> traffic class <-> Tx queues in a similar fashion as
   mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:

   $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
     	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
	   queues 0 1 2 3                                              \
     	   sched-file gates.sched [base-time <interval>]               \
           [cycle-time <interval>] [extension-time <interval>]

   <file> is multi-line, with each line being of the following format:
   <cmd> <gate mask> <interval in nanoseconds>

   Qbv only defines one <cmd>: "S" for 'SetGates'

   For example:

   S 0x01 300
   S 0x03 500

   This means that there are two intervals, the first will have the gate
   for traffic class 0 open for 300 nanoseconds, the second will have
   both traffic classes open for 500 nanoseconds.

   Additionally, an option to set just one entry of the gate control list will
   also be provided by 'taprio':

   $ tc qdisc (...) \
        sched-row <row number> <cmd> <gate mask> <interval>  \
        [base-time <interval>] [cycle-time <interval>] \
        [extension-time <interval>]


 * Frame Preemption (802.1Qbu):

   To control even further the latency, it may prove useful to signal which
   traffic classes are marked as preemptable. For that, 'taprio' provides the
   preemption command so you set each traffic class as preemptable or not:

   $ tc qdisc (...) \
        preemption 0 1 1 1


 * Time-aware shaper + Preemption:

   As an example of how Qbv and Qbu can be used together, we may specify
   both the schedule and the preempt-mask, and this way we may also
   specify the Set-Gates-and-Hold and Set-Gates-and-Release commands as
   specified in the Qbu spec:

   $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4 \
     	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                    \
	   queues 0 1 2 3                                         \
     	   preemption 0 1 1 1                                     \
	   sched-file preempt_gates.sched

    <file> is multi-line, with each line being of the following format:
    <cmd> <gate mask> <interval in nanoseconds>

    For this case, two new commands are introduced:

    "H" for 'set gates and hold'
    "R" for 'set gates and release'

    H 0x01 300
    R 0x03 500



Testing this RFC
================

For testing the patches of this RFC only, you can refer to the samples and
helper script being added to samples/tsn/ and the use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs' qdisc to
configure the HW shaper of the i210 controller:

1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev enp3s0 parent root mqprio num_tc 3 \
     map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g  class show dev enp3s0

Ex.:
+---(802a:3) mqprio
|    +---(802a:6) mqprio
|    +---(802a:7) mqprio
|
+---(802a:2) mqprio
|    +---(802a:5) mqprio
|
+---(802a:1) mqprio
     +---(802a:4) mqprio

 * Here '802a:4' is Tx Queue #0 and '802a:5' is Tx Queue #1.

3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
   for B is 10Mbps:
$ ./samples/tsn/calculate_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500

4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev enp3s0 parent 802a:4 cbs locredit -1470 \
     hicredit 30 sendslope -980000 idleslope 20000

5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev enp3s0 parent 802a:5 cbs \
     locredit -1485 hicredit 31 sendslope -990000 idleslope 10000

6) Run Listener, compiled from samples/tsn/listener.c
$ ./listener -i enp3s0

7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./talker -i enp3s0 -p 3

 * The bandwidth displayed on the listener output at this stage should be very
   close to the one configured for class A.

8) You can also run a Talker for class B (prio 2 here)
$ ./talker -i enp3s0 -p 2

 * The bandwidth displayed on the listener output now should increase to very
   close to the one configured for class A + class B.

Authors
=======
 - Andre Guedes <andre.guedes@xxxxxxxxx>
 - Ivan Briano <ivan.briano@xxxxxxxxx>
 - Jesus Sanchez-Palencia <jesus.sanchez-palencia@xxxxxxxxx>
 - Vinicius Gomes <vinicius.gomes@xxxxxxxxx>


Andre Guedes (2):
  igb: Add support for CBS offload
  samples/tsn: Add script for calculating CBS config

Jesus Sanchez-Palencia (1):
  sample: Add TSN Talker and Listener examples

Vinicius Costa Gomes (2):
  net/sched: Introduce the user API for the CBS shaper
  net/sched: Introduce Credit Based Shaper (CBS) qdisc

 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 349 +++++++++++++++++++++++++
 include/linux/netdevice.h                      |   1 +
 include/uapi/linux/pkt_sched.h                 |  29 ++
 net/sched/Kconfig                              |  11 +
 net/sched/Makefile                             |   1 +
 net/sched/sch_cbs.c                            | 286 ++++++++++++++++++++
 samples/tsn/calculate_cbs_params.py            | 112 ++++++++
 samples/tsn/listener.c                         | 254 ++++++++++++++++++
 samples/tsn/talker.c                           | 136 ++++++++++
 12 files changed, 1216 insertions(+)
 create mode 100644 net/sched/sch_cbs.c
 create mode 100755 samples/tsn/calculate_cbs_params.py
 create mode 100644 samples/tsn/listener.c
 create mode 100644 samples/tsn/talker.c

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-10-23 17:26 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-01  1:26 [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Vinicius Costa Gomes
2017-09-01  1:26 ` [RFC net-next 1/5] net/sched: Introduce the user API for the CBS shaper Vinicius Costa Gomes
2017-09-01  1:26 ` [RFC net-next 2/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc Vinicius Costa Gomes
2017-09-08 13:43   ` Henrik Austad
2017-09-14  0:39     ` Vinicius Costa Gomes
2017-09-01  1:26 ` [RFC net-next 3/5] igb: Add support for CBS offload Vinicius Costa Gomes
2017-09-01  1:26 ` [RFC net-next 4/5] sample: Add TSN Talker and Listener examples Vinicius Costa Gomes
2017-09-01  1:26 ` [RFC net-next 5/5] samples/tsn: Add script for calculating CBS config Vinicius Costa Gomes
2017-09-01 13:03 ` [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers Richard Cochran
2017-09-01 16:12   ` Jesus Sanchez-Palencia
2017-09-01 16:53     ` Richard Cochran
2017-09-05  7:20     ` Richard Cochran
2017-09-07  5:34 ` Henrik Austad
2017-09-07 12:40   ` Richard Cochran
2017-09-07 15:27     ` Henrik Austad
2017-09-07 15:53       ` Richard Cochran
2017-09-07 16:18         ` Henrik Austad
2017-09-07 21:51           ` Guedes, Andre
2017-09-07 19:58   ` Guedes, Andre
2017-09-08  6:06     ` Henrik Austad
2017-09-08  1:29   ` Vinicius Costa Gomes
2017-09-12  4:56     ` Richard Cochran
2017-09-18  8:02 ` Richard Cochran
2017-09-18 11:46   ` Henrik Austad
2017-09-18 23:06   ` Vinicius Costa Gomes
2017-09-19  5:22     ` Richard Cochran
2017-09-19 13:14       ` Henrik Austad
2017-09-20  0:19       ` Vinicius Costa Gomes
2017-09-20  5:25         ` Richard Cochran
2017-10-18 22:37           ` Jesus Sanchez-Palencia
2017-10-19 20:39             ` Richard Cochran
2017-10-23 17:18               ` Jesus Sanchez-Palencia
2017-09-20  5:58         ` Richard Cochran
2017-09-18  8:12 ` Richard Cochran
2017-09-20  5:17   ` TSN Scorecard, was " levipearson
2017-09-20  5:49     ` Richard Cochran
2017-09-20 21:29       ` Jesus Sanchez-Palencia
2017-09-20  1:59 ` levipearson
2017-09-20  5:56   ` Richard Cochran
2017-09-29 20:44 Rodney Cummings
2017-10-02 18:45 ` Levi Pearson
2017-10-02 19:40   ` Rodney Cummings
2017-10-02 21:48     ` Levi Pearson
2017-10-02 22:52       ` Rodney Cummings
2017-10-02 23:06   ` Guedes, Andre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).